1
00:00:14,250 --> 00:00:16,560
MICHALE FEE: Today,
we're going to finish up

2
00:00:16,560 --> 00:00:19,240
with recurrent neural networks.

3
00:00:19,240 --> 00:00:23,610
So as you remember, we've
been talking about the case

4
00:00:23,610 --> 00:00:26,670
where we have a layer of
neurons in which we have

5
00:00:26,670 --> 00:00:30,480
recurrent connections
between neurons in the output

6
00:00:30,480 --> 00:00:32,700
layer of our network.

7
00:00:32,700 --> 00:00:36,750
And we've been developing
the mathematical tools

8
00:00:36,750 --> 00:00:40,370
to describe the behavior
of these networks

9
00:00:40,370 --> 00:00:43,593
and describe how they
respond to their inputs.

10
00:00:43,593 --> 00:00:46,260
And we've been talking about the
different kinds of computations

11
00:00:46,260 --> 00:00:50,140
that recurrent neural
networks can perform.

12
00:00:50,140 --> 00:00:54,030
So you may recall that we
started talking about--

13
00:00:54,030 --> 00:00:57,810
we introduced the
math or the concept

14
00:00:57,810 --> 00:00:59,580
of how to study
recurrent neural networks

15
00:00:59,580 --> 00:01:01,900
by looking at the
simplest recurrent

16
00:01:01,900 --> 00:01:03,838
network that has a single--

17
00:01:03,838 --> 00:01:06,390
it's a single neuron with a
recurrent connection called

18
00:01:06,390 --> 00:01:08,530
an autapse.

19
00:01:08,530 --> 00:01:11,020
A recurrent connection
has a strength lambda.

20
00:01:11,020 --> 00:01:12,940
And we can write down--

21
00:01:12,940 --> 00:01:13,440
let's see.

22
00:01:13,440 --> 00:01:15,510
So we can write
down the equation

23
00:01:15,510 --> 00:01:17,700
for this, the response
of this neuron,

24
00:01:17,700 --> 00:01:21,720
without a recurrent
connection as tau dv dt equals

25
00:01:21,720 --> 00:01:26,000
minus v. The minus v is
essentially a leak term,

26
00:01:26,000 --> 00:01:27,810
so that if you put
input into the neuron,

27
00:01:27,810 --> 00:01:31,920
the response of the neuron
jumps up and then decays

28
00:01:31,920 --> 00:01:35,460
exponentially in
response to an input, h.

29
00:01:35,460 --> 00:01:38,430
If we have a recurrent
connection lambda,

30
00:01:38,430 --> 00:01:41,970
then there's an additional
input to the neuron

31
00:01:41,970 --> 00:01:45,780
that's proportional to the
firing rate of the neuron.

32
00:01:45,780 --> 00:01:48,825
We can rewrite that
equation now as tau dv

33
00:01:48,825 --> 00:01:53,580
dt equals minus quantity
one minus lambda times

34
00:01:53,580 --> 00:01:55,350
v plus the input.

35
00:01:55,350 --> 00:01:59,760
And the behavior of this
simple recurrent neural network

36
00:01:59,760 --> 00:02:05,400
depends strongly on the
value of this coefficient one

37
00:02:05,400 --> 00:02:06,660
minus lambda.

38
00:02:06,660 --> 00:02:08,580
And we've talked about
three different cases.

39
00:02:08,580 --> 00:02:11,410
We've talked about case where
lambda is less than one,

40
00:02:11,410 --> 00:02:15,010
where lambda is equal to one--

41
00:02:15,010 --> 00:02:17,250
in which case, this
coefficient is zero--

42
00:02:17,250 --> 00:02:19,150
and when lambda is
greater than one.

43
00:02:19,150 --> 00:02:23,050
So let's look at those three
cases again for this equation.

44
00:02:23,050 --> 00:02:24,870
So when lambda is
less than one, you

45
00:02:24,870 --> 00:02:27,960
can see that this quantity
right here, this coefficient

46
00:02:27,960 --> 00:02:30,340
in front of the v is negative.

47
00:02:30,340 --> 00:02:34,830
And what that means is that
the firing rate of this neuron

48
00:02:34,830 --> 00:02:40,350
relaxes exponentially
toward some h infinity--

49
00:02:40,350 --> 00:02:42,150
sorry, some v infinity.

50
00:02:42,150 --> 00:02:45,740
And then when the input
goes away, the neuron--

51
00:02:45,740 --> 00:02:50,230
the firing rate decays
exponentially towards zero.

52
00:02:50,230 --> 00:02:53,630
OK, so in the case where
lambda is equal to one,

53
00:02:53,630 --> 00:02:56,250
you can see that this
coefficient is zero.

54
00:02:56,250 --> 00:02:58,670
And now you can see
that the derivative

55
00:02:58,670 --> 00:03:02,012
of the firing rate of the neuron
is just equal to the input.

56
00:03:02,012 --> 00:03:04,220
What that means is that the
firing rate of the neuron

57
00:03:04,220 --> 00:03:06,460
essentially
integrates the input.

58
00:03:06,460 --> 00:03:10,040
And you can see, if you put
a step input into this neuron

59
00:03:10,040 --> 00:03:14,120
with this recurrent connection
of lambda equal one,

60
00:03:14,120 --> 00:03:18,200
that the response of the neuron
simply ramps up linearly,

61
00:03:18,200 --> 00:03:20,960
which corresponds to
integrating that step input.

62
00:03:20,960 --> 00:03:23,360
And then when the
input is turned off

63
00:03:23,360 --> 00:03:25,250
and goes back to
zero, you can see

64
00:03:25,250 --> 00:03:27,680
that the firing rate of
the neuron stays constant.

65
00:03:27,680 --> 00:03:31,580
And that's because
the leak is exactly

66
00:03:31,580 --> 00:03:36,680
balanced by this excitatory
recurrent input from the neuron

67
00:03:36,680 --> 00:03:38,850
onto itself.

68
00:03:38,850 --> 00:03:41,970
So you can see that for the
case for lambda equals one,

69
00:03:41,970 --> 00:03:45,390
there's persistent
activity after you

70
00:03:45,390 --> 00:03:46,830
put an input into this neuron.

71
00:03:46,830 --> 00:03:53,550
And we talked about how this
forms a short-term memory that

72
00:03:53,550 --> 00:03:55,700
can be used for a bunch
of different things.

73
00:03:55,700 --> 00:03:59,220
It's a short-term
memory of a scalar,

74
00:03:59,220 --> 00:04:01,830
or a continuous quantity,
like I position.

75
00:04:01,830 --> 00:04:07,770
Or we talked about short-term
memory integration being used

76
00:04:07,770 --> 00:04:13,470
for path integration or for
accumulating evidence across

77
00:04:13,470 --> 00:04:14,460
noisy--

78
00:04:14,460 --> 00:04:17,939
over long exposure
to a noisy stimulus.

79
00:04:20,490 --> 00:04:24,000
So today, we're going to focus
on networks where this lambda

80
00:04:24,000 --> 00:04:25,620
is greater than one.

81
00:04:25,620 --> 00:04:29,100
And in that case, you can see
that the differential equation

82
00:04:29,100 --> 00:04:29,790
looks like this.

83
00:04:29,790 --> 00:04:31,910
So if lambda is
greater than one,

84
00:04:31,910 --> 00:04:36,270
then the quantity inside the
parentheses here is negative.

85
00:04:36,270 --> 00:04:39,400
But that's multiplied
by a minus one.

86
00:04:39,400 --> 00:04:42,910
So the coefficient in
front of the v is positive.

87
00:04:42,910 --> 00:04:47,860
So if v itself is
a positive number,

88
00:04:47,860 --> 00:04:50,950
then dv dt is also positive.

89
00:04:50,950 --> 00:04:54,160
So if v is positive
and dv dt is positive,

90
00:04:54,160 --> 00:04:57,120
then what that means is that
the firing rate of that neuron

91
00:04:57,120 --> 00:05:02,130
is growing and, in this case,
is growing exponentially.

92
00:05:02,130 --> 00:05:05,280
So that when you put an input
in, the response of the neuron

93
00:05:05,280 --> 00:05:06,600
grows exponentially.

94
00:05:06,600 --> 00:05:10,980
But when you turn the input off,
the firing rate of the neuron

95
00:05:10,980 --> 00:05:17,090
continues to grow exponentially,
which is a little bit crazy.

96
00:05:17,090 --> 00:05:19,120
You know that neurons
in the brain, of course,

97
00:05:19,120 --> 00:05:20,950
don't have firing
rates that just

98
00:05:20,950 --> 00:05:23,790
keep growing exponentially.

99
00:05:23,790 --> 00:05:28,150
So we're going to solve that
problem by using nonlinearities

100
00:05:28,150 --> 00:05:31,030
in the firing F-I
curve of neurons.

101
00:05:31,030 --> 00:05:34,810
But the key point here is
that this kind of network

102
00:05:34,810 --> 00:05:37,840
actually remembers that there
was an input, as opposed

103
00:05:37,840 --> 00:05:41,740
to this kind of network, where
the when the input goes away,

104
00:05:41,740 --> 00:05:45,490
the activity of the network
just decays back to zero.

105
00:05:45,490 --> 00:05:48,130
This kind of network
has no memory

106
00:05:48,130 --> 00:05:51,230
that there was an input
long ago in the past.

107
00:05:51,230 --> 00:05:53,590
Whereas, this kind
of network remembers

108
00:05:53,590 --> 00:05:55,180
that there was an input.

109
00:05:55,180 --> 00:06:00,250
And so that kind of property
when lambda is greater than one

110
00:06:00,250 --> 00:06:02,470
is useful for storing memories.

111
00:06:02,470 --> 00:06:06,370
So we're going to
expand on that idea.

112
00:06:06,370 --> 00:06:09,820
In particular, we're
going to use that theme

113
00:06:09,820 --> 00:06:14,020
to build networks that
have attractors, that

114
00:06:14,020 --> 00:06:16,960
have stable states
that they can go to,

115
00:06:16,960 --> 00:06:21,610
that depend on prior
inputs, but also can be used

116
00:06:21,610 --> 00:06:23,860
to store long-term memories.

117
00:06:23,860 --> 00:06:25,340
All right?

118
00:06:25,340 --> 00:06:27,850
We're going to see how
that kind of network

119
00:06:27,850 --> 00:06:31,960
can also be used to produce a
winner-take-all network that

120
00:06:31,960 --> 00:06:35,020
is sensitive to
which of two inputs

121
00:06:35,020 --> 00:06:39,130
are stronger and stores a
memory of preceding inputs

122
00:06:39,130 --> 00:06:41,740
where one input is
stronger than the other.

123
00:06:41,740 --> 00:06:47,990
Or it ends up in a different
state when, let's say, input

124
00:06:47,990 --> 00:06:49,620
one is stronger
than input 2, and it

125
00:06:49,620 --> 00:06:51,370
lands in a different
state when input 2 is

126
00:06:51,370 --> 00:06:54,660
stronger than input one.

127
00:06:54,660 --> 00:06:58,200
We're going to then describe
a particular model, called

128
00:06:58,200 --> 00:07:01,470
a Hopfield model, for how
attractor networks can

129
00:07:01,470 --> 00:07:03,933
store long-term memories.

130
00:07:03,933 --> 00:07:05,850
We're going to introduce
the idea of an energy

131
00:07:05,850 --> 00:07:09,510
landscape, which is a
property of networks that

132
00:07:09,510 --> 00:07:13,350
have symmetric connections,
of which the Hopfield model is

133
00:07:13,350 --> 00:07:14,407
an example.

134
00:07:14,407 --> 00:07:15,990
And then we're going
to end by talking

135
00:07:15,990 --> 00:07:21,180
about how many memories such
a network can actually store,

136
00:07:21,180 --> 00:07:24,170
known as the capacity problem.

137
00:07:24,170 --> 00:07:27,520
OK, so let's start
with recurrent networks

138
00:07:27,520 --> 00:07:28,790
with lambda greater than one.

139
00:07:28,790 --> 00:07:30,460
So let's start with our autapse.

140
00:07:30,460 --> 00:07:33,550
Let's put lambda equal to 2.

141
00:07:33,550 --> 00:07:37,120
And again, you can see that
if we rewrite this equation

142
00:07:37,120 --> 00:07:40,270
with lambda greater
than one, we can

143
00:07:40,270 --> 00:07:45,760
write tau dv dt equals lambda
minus one times v plus h.

144
00:07:45,760 --> 00:07:52,020
You can see that
the value of zero,

145
00:07:52,020 --> 00:07:55,255
at the firing rate of
zero, is an unstable fixed

146
00:07:55,255 --> 00:07:56,130
point of the network.

147
00:07:56,130 --> 00:07:56,670
Why is that?

148
00:07:56,670 --> 00:08:02,950
Because at v equals zero,
then dv dt equals zero.

149
00:08:02,950 --> 00:08:05,860
So what that means is that
if the firing rate is exactly

150
00:08:05,860 --> 00:08:09,890
zero, that's a fixed
point of the system.

151
00:08:09,890 --> 00:08:14,020
But if v deviates very
slightly from zero,

152
00:08:14,020 --> 00:08:16,900
v becomes very
slightly positive,

153
00:08:16,900 --> 00:08:21,250
then dv dt is positive, and
the firing rate of the neuron

154
00:08:21,250 --> 00:08:22,240
starts running away.

155
00:08:25,000 --> 00:08:29,430
So what you can see is if you
start the fire rate at zero

156
00:08:29,430 --> 00:08:34,169
and have the input at
zero, then dv dt is zero,

157
00:08:34,169 --> 00:08:37,210
and the network will
stay at zero firing rate.

158
00:08:37,210 --> 00:08:42,030
But if you put in a very
slight, a very small input, then

159
00:08:42,030 --> 00:08:46,590
dv dt goes positive, and the
network activity runs away.

160
00:08:49,560 --> 00:08:53,260
Now, let's put in an input
of the opposite sign.

161
00:08:53,260 --> 00:08:55,560
So now let's start
with v equals zero

162
00:08:55,560 --> 00:08:59,130
and put in a very
tiny negative input.

163
00:08:59,130 --> 00:09:01,410
What's the network going to do?

164
00:09:01,410 --> 00:09:06,720
So tau dv dt equals v. So v
is very slightly negative,

165
00:09:06,720 --> 00:09:09,600
or if h is very slightly
negative and v is zero,

166
00:09:09,600 --> 00:09:14,130
then dv dt will be negative,
and the network will run away

167
00:09:14,130 --> 00:09:15,630
in the negative direction.

168
00:09:15,630 --> 00:09:20,310
So this network actually
can produce two memories.

169
00:09:20,310 --> 00:09:26,780
It can produce a memory that a
preceding input was positive,

170
00:09:26,780 --> 00:09:31,390
or it can store a memory that
a preceding input was negative.

171
00:09:31,390 --> 00:09:35,990
So it has two
configurations after you've

172
00:09:35,990 --> 00:09:39,020
put in an input that is
positive or negative, right?

173
00:09:39,020 --> 00:09:43,190
It can produce a positive output
or a negative output that's

174
00:09:43,190 --> 00:09:44,550
persistent for a long time.

175
00:09:44,550 --> 00:09:45,300
Yes?

176
00:09:45,300 --> 00:09:47,505
AUDIENCE: Is the [INAUDIBLE]
of a negative firing

177
00:09:47,505 --> 00:09:50,210
rate [INAUDIBLE]?

178
00:09:50,210 --> 00:09:51,980
MICHALE FEE: Yeah.

179
00:09:51,980 --> 00:09:55,790
So you can basically
reformulate everything

180
00:09:55,790 --> 00:09:57,770
that we've been talking
about for neurons

181
00:09:57,770 --> 00:10:03,350
that have zero, that can't
have negative firing rates.

182
00:10:03,350 --> 00:10:08,173
But in this case, we've been
working with linear neurons.

183
00:10:08,173 --> 00:10:10,340
And it seems like the
negative fire rates are pretty

184
00:10:10,340 --> 00:10:12,770
non-physical, non-intuitive.

185
00:10:12,770 --> 00:10:18,110
But it's a pretty standard way
to do the mathematical analysis

186
00:10:18,110 --> 00:10:20,810
for neurons like this, is
to treat them as linear.

187
00:10:20,810 --> 00:10:24,170
But you can sort of reformulate
all of these networks

188
00:10:24,170 --> 00:10:26,420
in a way that don't have
that non-physical property.

189
00:10:26,420 --> 00:10:32,510
So for now, let's just bear
with this slightly uncomfortable

190
00:10:32,510 --> 00:10:35,330
situation of having neurons
that have negative firing rates.

191
00:10:35,330 --> 00:10:37,880
Generally, we're going to
associate negative firing rates

192
00:10:37,880 --> 00:10:43,280
as inhibition, OK?

193
00:10:43,280 --> 00:10:46,220
But don't worry about that here.

194
00:10:46,220 --> 00:10:48,980
All right, so we're going
to solve this problem

195
00:10:48,980 --> 00:10:51,830
that these neurons have firing
rates that are kind of running

196
00:10:51,830 --> 00:10:57,710
away exponentially by adding a
nonlinear activation function.

197
00:10:57,710 --> 00:11:00,290
So a typical
nonlinear activation

198
00:11:00,290 --> 00:11:03,080
function that you might
use for linear neurons,

199
00:11:03,080 --> 00:11:06,590
like for networks of the
type we've been considering,

200
00:11:06,590 --> 00:11:09,710
is a symmetric F-I curve,
where if the input is

201
00:11:09,710 --> 00:11:13,760
positive and small, the
firing rate of the neuron

202
00:11:13,760 --> 00:11:18,110
grows linearly, until you reach
a point where it saturates.

203
00:11:18,110 --> 00:11:21,530
And larger inputs don't
produce any larger firing

204
00:11:21,530 --> 00:11:23,550
rate of the neuron.

205
00:11:23,550 --> 00:11:27,980
So most neurons actually have
kind of a saturating F-I curve,

206
00:11:27,980 --> 00:11:31,190
like this, like the
Hodgkin-Huxley neurons begin

207
00:11:31,190 --> 00:11:31,970
to saturate.

208
00:11:31,970 --> 00:11:32,570
Why is that?

209
00:11:32,570 --> 00:11:36,410
Because the sodium channels
begin to inactivate,

210
00:11:36,410 --> 00:11:42,000
and it can't fire
any faster than the--

211
00:11:42,000 --> 00:11:45,800
there's a time
between spikes that's

212
00:11:45,800 --> 00:11:48,260
sort of the closest that
the neuron-- the fastest

213
00:11:48,260 --> 00:11:50,210
that the neuron can
spike because of sodium

214
00:11:50,210 --> 00:11:52,490
channel inactivation.

215
00:11:52,490 --> 00:11:54,410
And then on the minus
side, if the input

216
00:11:54,410 --> 00:11:57,560
is small and negative, then
the firing rate of the neuron

217
00:11:57,560 --> 00:11:59,660
goes negative
linearly for a while

218
00:11:59,660 --> 00:12:01,880
and then saturates
at some value.

219
00:12:01,880 --> 00:12:05,390
And we typically have the
neuron saturating between one

220
00:12:05,390 --> 00:12:07,870
and minus one.

221
00:12:07,870 --> 00:12:10,890
So now, if you start your
neuron at zero firing rate

222
00:12:10,890 --> 00:12:13,230
and you put in a
little positive input,

223
00:12:13,230 --> 00:12:14,480
what's the neuron going to do?

224
00:12:17,110 --> 00:12:19,402
Any guesses?

225
00:12:19,402 --> 00:12:20,310
AUDIENCE: [INAUDIBLE]

226
00:12:20,310 --> 00:12:21,670
MICHALE FEE: Yeah.

227
00:12:21,670 --> 00:12:25,030
It's going to start
running up exponentially,

228
00:12:25,030 --> 00:12:29,370
but then it's going
to saturate up here.

229
00:12:29,370 --> 00:12:34,810
And so the firing rate
will run up and sit at one.

230
00:12:34,810 --> 00:12:38,770
And if we put in a negative
input, a small negative input,

231
00:12:38,770 --> 00:12:40,450
then the neuron--

232
00:12:40,450 --> 00:12:43,520
then this little
recurrent network

233
00:12:43,520 --> 00:12:48,490
will go negative and
saturate at minus one, OK?

234
00:12:48,490 --> 00:12:51,280
So you can see that
this network actually

235
00:12:51,280 --> 00:12:56,880
has one unstable fixed point,
where if it sits exactly

236
00:12:56,880 --> 00:13:00,630
at zero, it will stay
at zero, until you

237
00:13:00,630 --> 00:13:04,560
give a little bit of
input in either direction.

238
00:13:04,560 --> 00:13:08,670
And then the network will run up
and sit at another fixed point

239
00:13:08,670 --> 00:13:10,350
here of one.

240
00:13:10,350 --> 00:13:12,000
If you put in a
big negative input,

241
00:13:12,000 --> 00:13:14,040
you can drive it to
another fixed point.

242
00:13:14,040 --> 00:13:16,470
And these two are
stable fixed points,

243
00:13:16,470 --> 00:13:19,500
because once they're
in that state,

244
00:13:19,500 --> 00:13:22,530
if you give little
perturbations to the network,

245
00:13:22,530 --> 00:13:26,310
it will deviate a little
bit from that value.

246
00:13:26,310 --> 00:13:29,130
If you give a small
negative input,

247
00:13:29,130 --> 00:13:31,470
you can cause this to
decrease a little bit.

248
00:13:31,470 --> 00:13:34,050
But then when the input goes
away, it will relax back.

249
00:13:34,050 --> 00:13:36,510
So this is an
unstable fixed point,

250
00:13:36,510 --> 00:13:40,180
and these are two
stable fixed points.

251
00:13:40,180 --> 00:13:43,510
Now, we're going to come back
to this in more detail later.

252
00:13:43,510 --> 00:13:46,030
But we often think
about networks

253
00:13:46,030 --> 00:13:55,170
like this as sort of
like a ball on a hill.

254
00:13:55,170 --> 00:14:00,570
So you can imagine that you
can describe this network using

255
00:14:00,570 --> 00:14:02,550
what's called an
energy landscape, where

256
00:14:02,550 --> 00:14:07,200
if you start this system
at some point on this sort

257
00:14:07,200 --> 00:14:13,500
of valley-shaped
hill, all right,

258
00:14:13,500 --> 00:14:16,980
the network sort of-- it's like
a ball that rolls downhill.

259
00:14:16,980 --> 00:14:20,490
So if you start the network
exactly at the peak,

260
00:14:20,490 --> 00:14:22,110
the ball will sit there.

261
00:14:22,110 --> 00:14:24,810
But if you give it a
little bit of a nudge,

262
00:14:24,810 --> 00:14:30,300
it will roll downhill toward
one of these stable points, OK?

263
00:14:30,300 --> 00:14:32,640
If you start it slightly
on the other side,

264
00:14:32,640 --> 00:14:35,370
it will roll this way, OK?

265
00:14:35,370 --> 00:14:38,520
And those stable fixed
points are called attractors.

266
00:14:38,520 --> 00:14:41,550
And this particular
network has two tractors--

267
00:14:41,550 --> 00:14:44,160
one with a firing
rate of one and one

268
00:14:44,160 --> 00:14:45,580
at a firing rate of minus one.

269
00:14:45,580 --> 00:14:46,403
Yes, Appolonia?

270
00:14:46,403 --> 00:14:47,778
AUDIENCE: The
stable fixed points

271
00:14:47,778 --> 00:14:51,345
of the top graph, where'd
you say they were?

272
00:14:51,345 --> 00:14:52,830
MICHALE FEE: The
stable fixed point

273
00:14:52,830 --> 00:14:54,780
is here, because once you--

274
00:14:54,780 --> 00:14:56,730
if the system is
in this state, you

275
00:14:56,730 --> 00:14:59,040
can give slight
perturbations and the system

276
00:14:59,040 --> 00:15:01,770
returns to that fixed point.

277
00:15:01,770 --> 00:15:03,960
This is an unstable fixed
point, because if you

278
00:15:03,960 --> 00:15:06,090
start the system
there and give it

279
00:15:06,090 --> 00:15:09,555
a little nudge in either
direction, the state runs away.

280
00:15:09,555 --> 00:15:11,020
Does that makes sense?

281
00:15:11,020 --> 00:15:11,737
AUDIENCE: Yeah.

282
00:15:11,737 --> 00:15:13,320
MICHALE FEE: Any
questions about that?

283
00:15:13,320 --> 00:15:14,070
Yes?

284
00:15:14,070 --> 00:15:17,861
AUDIENCE: How is the shape of
the curve [INAUDIBLE] points

285
00:15:17,861 --> 00:15:20,733
determined based on like--

286
00:15:20,733 --> 00:15:22,150
MICHALE FEE: So
I'm going to get--

287
00:15:22,150 --> 00:15:23,900
I'm going to come back
to how you actually

288
00:15:23,900 --> 00:15:27,730
calculate this energy
landscape more formally.

289
00:15:27,730 --> 00:15:34,870
There's a very precise
mathematical definition

290
00:15:34,870 --> 00:15:38,560
of how you define
this energy landscape.

291
00:15:38,560 --> 00:15:42,530
All right, so this was all
for the case of one neuron,

292
00:15:42,530 --> 00:15:43,030
all right?

293
00:15:43,030 --> 00:15:47,170
So now let's extend it to
the case of multiple neurons.

294
00:15:47,170 --> 00:15:52,150
So let's just take two
neurons with an autapse.

295
00:15:52,150 --> 00:15:58,990
One of these autapses have
a value strength of two,

296
00:15:58,990 --> 00:16:01,540
and the other autapse have
a strength of minus two.

297
00:16:01,540 --> 00:16:04,090
So this one is recurrent
and excitatory.

298
00:16:04,090 --> 00:16:06,365
This one is recurrent
and inhibitory.

299
00:16:06,365 --> 00:16:07,990
So now what we're
going to do is we can

300
00:16:07,990 --> 00:16:10,540
plot the state of the network.

301
00:16:10,540 --> 00:16:14,890
Now, instead of being
the state of the network

302
00:16:14,890 --> 00:16:19,180
in one dimension, v, we're
now going to have v1 and v2.

303
00:16:19,180 --> 00:16:21,640
So the state of
the system is going

304
00:16:21,640 --> 00:16:26,140
to be a point in a plane
given by v1 and v2.

305
00:16:29,440 --> 00:16:33,950
So now, by looking
at this network,

306
00:16:33,950 --> 00:16:38,090
you can see immediately
that this particular neuron,

307
00:16:38,090 --> 00:16:40,810
this neuron with a
firing rate of v2,

308
00:16:40,810 --> 00:16:44,440
looks like the kind of
network that we've already

309
00:16:44,440 --> 00:16:45,130
studied, right?

310
00:16:45,130 --> 00:16:50,170
It has a stable
fixed point at zero.

311
00:16:50,170 --> 00:16:54,790
And this network has two
stable fixed points--

312
00:16:54,790 --> 00:16:58,630
one at one and the
other one at minus one.

313
00:16:58,630 --> 00:17:00,670
So you can see that
this system will also

314
00:17:00,670 --> 00:17:04,099
have two stable fixed points--

315
00:17:04,099 --> 00:17:05,810
one there and one there, right?

316
00:17:05,810 --> 00:17:08,630
Because if I take
the input away,

317
00:17:08,630 --> 00:17:12,500
this neuron is either
going to one or minus one,

318
00:17:12,500 --> 00:17:14,690
and this neuron is
going to go to zero.

319
00:17:14,690 --> 00:17:17,819
So there's one and minus
one on the v1 axis.

320
00:17:17,819 --> 00:17:21,950
And those two states have zero
firing rate on the v2 axis.

321
00:17:21,950 --> 00:17:24,430
Is that clear?

322
00:17:24,430 --> 00:17:30,400
So now what's going to happen
if we made this autapse have

323
00:17:30,400 --> 00:17:32,860
a strength of two?

324
00:17:32,860 --> 00:17:35,300
Anybody want to take a guess?

325
00:17:35,300 --> 00:17:38,048
AUDIENCE: That's,
like, four attractors?

326
00:17:38,048 --> 00:17:38,840
MICHALE FEE: Right.

327
00:17:38,840 --> 00:17:40,871
Why is that?

328
00:17:40,871 --> 00:17:46,424
AUDIENCE: Because that will
also have stable fixed points

329
00:17:46,424 --> 00:17:48,950
at [INAUDIBLE].

330
00:17:48,950 --> 00:17:50,300
MICHALE FEE: Right.

331
00:17:50,300 --> 00:17:53,090
So this one will have
stable fixed points

332
00:17:53,090 --> 00:17:54,560
at one and minus one.

333
00:17:54,560 --> 00:17:56,660
This will also have
stable fixed points at one

334
00:17:56,660 --> 00:17:58,460
and minus one, right?

335
00:17:58,460 --> 00:18:01,560
And the system can be in
any one of four states--

336
00:18:01,560 --> 00:18:02,600
0, 0.

337
00:18:02,600 --> 00:18:08,810
Sorry, 1, 1; minus 1, minus
1; 1 minus 1; and minus 1, 1.

338
00:18:08,810 --> 00:18:10,500
That's right.

339
00:18:10,500 --> 00:18:14,140
All right, so I just want to
make one other point here,

340
00:18:14,140 --> 00:18:16,350
which is that no
matter where you start

341
00:18:16,350 --> 00:18:19,820
the system for
this network, it's

342
00:18:19,820 --> 00:18:23,720
going to evolve towards one
of these stable fixed points,

343
00:18:23,720 --> 00:18:26,550
unless I started it exactly
right there at zero.

344
00:18:26,550 --> 00:18:28,310
That will be
another fixed point,

345
00:18:28,310 --> 00:18:30,380
but that's an
unstable fixed point.

346
00:18:30,380 --> 00:18:33,890
OK, so this system will--

347
00:18:33,890 --> 00:18:36,650
no matter where
I start the state

348
00:18:36,650 --> 00:18:39,720
of that system, other than
that exact point right there,

349
00:18:39,720 --> 00:18:43,208
the network will evolve toward
one of those two attractors.

350
00:18:43,208 --> 00:18:44,750
That's why they're
called attractors,

351
00:18:44,750 --> 00:18:49,070
because they attract the
state of the system toward one

352
00:18:49,070 --> 00:18:50,810
of those two points.

353
00:18:50,810 --> 00:18:51,641
Yes?

354
00:18:51,641 --> 00:18:56,351
AUDIENCE: So are the attractors
determined by the nonlinear

355
00:18:56,351 --> 00:18:57,263
activation function?

356
00:18:57,263 --> 00:18:58,180
MICHALE FEE: They are.

357
00:18:58,180 --> 00:19:03,840
So if this non-linear activation
function saturated at two

358
00:19:03,840 --> 00:19:08,220
and minus two, then these two
points would be up here at two

359
00:19:08,220 --> 00:19:08,910
and minus two.

360
00:19:13,990 --> 00:19:18,580
So you could see that this
network has two eigenvalues,

361
00:19:18,580 --> 00:19:19,180
right?

362
00:19:19,180 --> 00:19:21,620
If we think of it
as a linear network,

363
00:19:21,620 --> 00:19:23,440
this network has
two eigenvalues.

364
00:19:23,440 --> 00:19:29,020
The connection matrix is given
by a diagonal matrix with a two

365
00:19:29,020 --> 00:19:31,960
and a minus two along
the diagonals, right?

366
00:19:31,960 --> 00:19:33,870
So let's take a look at
this kind of network.

367
00:19:33,870 --> 00:19:37,300
Now, instead of an
autapse network,

368
00:19:37,300 --> 00:19:41,360
we have recurrent connections
of strength minus 2 and minus 2.

369
00:19:41,360 --> 00:19:46,045
So what does that
weight matrix look like?

370
00:19:46,045 --> 00:19:48,340
AUDIENCE: 0, minus
2; minus 2, 0.

371
00:19:48,340 --> 00:19:51,990
MICHALE FEE: 0, minus
2; minus 2, 0, right?

372
00:19:51,990 --> 00:19:57,700
Well, what are the
eigenvalues of this network?

373
00:19:57,700 --> 00:20:00,454
Anybody remember that?

374
00:20:00,454 --> 00:20:01,408
AUDIENCE: [INAUDIBLE]

375
00:20:01,408 --> 00:20:02,200
MICHALE FEE: Right.

376
00:20:02,200 --> 00:20:05,410
It's a plus b and a minus b.

377
00:20:05,410 --> 00:20:08,620
And so the eigenvalues
of this network

378
00:20:08,620 --> 00:20:13,540
are 0 plus negative 2
and 0 minus negative 2.

379
00:20:13,540 --> 00:20:16,900
So it's 2 and minus 2, right?

380
00:20:16,900 --> 00:20:20,950
So this network here will have
exactly the same eigenvalues

381
00:20:20,950 --> 00:20:22,220
as this network.

382
00:20:22,220 --> 00:20:24,490
But what's going
to be different?

383
00:20:24,490 --> 00:20:27,165
What are the eigenvectors?

384
00:20:27,165 --> 00:20:28,553
AUDIENCE: The 45.

385
00:20:28,553 --> 00:20:29,720
MICHALE FEE: The 45 degrees.

386
00:20:29,720 --> 00:20:33,940
So the eigenvectors of this
network are the x- and y-axes.

387
00:20:33,940 --> 00:20:37,160
The eigenvectors of this
network are the 45-degree lines.

388
00:20:37,160 --> 00:20:39,670
So anybody want to
take a guess as to what

389
00:20:39,670 --> 00:20:41,200
the stable states of this--

390
00:20:45,080 --> 00:20:48,840
it's just this network
rotated by 45 degrees, right?

391
00:20:51,930 --> 00:20:55,860
So those are now the attractors
of this network, right?

392
00:20:55,860 --> 00:20:57,750
And that makes sense, right?

393
00:20:57,750 --> 00:21:00,480
This neuron can be
positive, but that's

394
00:21:00,480 --> 00:21:05,460
going to be strongly driving
this neuron negative.

395
00:21:05,460 --> 00:21:07,500
But if this neuron
is negative, that's

396
00:21:07,500 --> 00:21:11,370
going to be strongly driving
this neuron positive, right?

397
00:21:11,370 --> 00:21:16,080
And so this network will
want to sit out here

398
00:21:16,080 --> 00:21:20,280
on this line in this direction
or in this direction.

399
00:21:20,280 --> 00:21:22,520
And because of the saturation--

400
00:21:22,520 --> 00:21:25,880
if there were no saturation,
if this were a linear network,

401
00:21:25,880 --> 00:21:27,500
the activity of this
neuron would just

402
00:21:27,500 --> 00:21:32,660
be running exponentially
up these 45-degree lines.

403
00:21:32,660 --> 00:21:34,310
But because of
the saturation, it

404
00:21:34,310 --> 00:21:36,745
gets stuck here at 1, minus 1.

405
00:21:36,745 --> 00:21:40,880
Or rather, minus
1, 1 or 1, minus 1.

406
00:21:40,880 --> 00:21:42,967
Any questions about that?

407
00:21:42,967 --> 00:21:43,550
Yeah, Jasmine?

408
00:21:46,412 --> 00:21:48,630
AUDIENCE: So the two
fixed points right now,

409
00:21:48,630 --> 00:21:52,590
like it's [INAUDIBLE]?

410
00:21:52,590 --> 00:21:54,000
MICHALE FEE: Yeah.

411
00:21:54,000 --> 00:21:58,912
It'll be one in this direction
and one in that direction.

412
00:21:58,912 --> 00:22:02,053
AUDIENCE: So why [INAUDIBLE]?

413
00:22:02,053 --> 00:22:03,970
MICHALE FEE: Because
this neuron is saturated.

414
00:22:03,970 --> 00:22:07,090
Because the saturation is acting
at the level of the individual

415
00:22:07,090 --> 00:22:07,660
neurons.

416
00:22:07,660 --> 00:22:09,797
AUDIENCE: OK.

417
00:22:09,797 --> 00:22:11,380
MICHALE FEE: So each
neuron will go up

418
00:22:11,380 --> 00:22:14,604
to its own saturation point.

419
00:22:14,604 --> 00:22:17,400
OK?

420
00:22:17,400 --> 00:22:17,900
All right.

421
00:22:21,670 --> 00:22:24,400
So this kind of network
is actually pretty cool.

422
00:22:24,400 --> 00:22:26,980
This network can
implement decision-making.

423
00:22:26,980 --> 00:22:30,280
It can decide, for example,
whether one input is bigger

424
00:22:30,280 --> 00:22:32,660
than the other, all right?

425
00:22:32,660 --> 00:22:34,310
So if we have an input--

426
00:22:34,310 --> 00:22:35,890
so let's start our
network right here

427
00:22:35,890 --> 00:22:38,200
at this unstable fixed
point, all right?

428
00:22:38,200 --> 00:22:40,810
We've carefully balanced
the ball on top of the hill,

429
00:22:40,810 --> 00:22:42,240
and it just sits there.

430
00:22:42,240 --> 00:22:45,610
And now let's put an input
that is in this direction h,

431
00:22:45,610 --> 00:22:49,710
so that it's slightly
pointing to the right

432
00:22:49,710 --> 00:22:51,300
of this diagonal line.

433
00:22:51,300 --> 00:22:52,570
So what's going to happen?

434
00:22:52,570 --> 00:22:54,540
It's going to kick the
state of the network

435
00:22:54,540 --> 00:22:57,300
up in this direction, right?

436
00:22:57,300 --> 00:23:02,670
But we've already discussed
how if the network state is

437
00:23:02,670 --> 00:23:06,120
anywhere on either
side of that line,

438
00:23:06,120 --> 00:23:09,000
it will evolve toward
the fixed point.

439
00:23:09,000 --> 00:23:12,970
If the h is on
the other side, it

440
00:23:12,970 --> 00:23:15,970
will kick the network
unstable fixed point

441
00:23:15,970 --> 00:23:19,300
into this part of
the state space.

442
00:23:19,300 --> 00:23:23,470
And then the network will evolve
toward this fixed point, OK?

443
00:23:23,470 --> 00:23:28,390
These half planes
here, this region here,

444
00:23:28,390 --> 00:23:32,930
is called the attractor
basin for this attractor.

445
00:23:32,930 --> 00:23:36,160
And on this side, it's
called attractor basin

446
00:23:36,160 --> 00:23:38,950
for that attractor, OK?

447
00:23:38,950 --> 00:23:41,140
And you can see that
this network will

448
00:23:41,140 --> 00:23:46,660
be very sensitive to
whichever input, h1 or h2,

449
00:23:46,660 --> 00:23:47,635
is slightly larger.

450
00:23:50,583 --> 00:23:52,000
So let me show you
what that looks

451
00:23:52,000 --> 00:23:54,820
like in this little movie.

452
00:23:54,820 --> 00:24:00,450
So we're going to
start with our network

453
00:24:00,450 --> 00:24:02,330
exactly at the zero point.

454
00:24:02,330 --> 00:24:04,490
And we're going to give an
input in this direction.

455
00:24:04,490 --> 00:24:07,500
And you can see that we've
kicked the network slightly

456
00:24:07,500 --> 00:24:08,520
this way.

457
00:24:08,520 --> 00:24:11,160
And now the network evolves
toward the fixed point,

458
00:24:11,160 --> 00:24:12,300
and it stays there.

459
00:24:12,300 --> 00:24:13,860
Now if we give a
big input this way,

460
00:24:13,860 --> 00:24:16,530
we can push network
over, push it

461
00:24:16,530 --> 00:24:21,000
to the other side of this
dividing line between the two

462
00:24:21,000 --> 00:24:23,190
basins of attraction,
and now the network

463
00:24:23,190 --> 00:24:25,620
sits here at this fixed point.

464
00:24:25,620 --> 00:24:28,800
We can kick it again with
another input and push it back.

465
00:24:28,800 --> 00:24:31,320
So it's kind of like
a flip-flop, right?

466
00:24:31,320 --> 00:24:33,370
It's pretty cool.

467
00:24:33,370 --> 00:24:37,740
It detects which
input was larger,

468
00:24:37,740 --> 00:24:41,640
pushes the network into an
attractor that then remembers

469
00:24:41,640 --> 00:24:43,710
which input was
larger for, basically,

470
00:24:43,710 --> 00:24:45,660
as long as the network--

471
00:24:45,660 --> 00:24:47,670
as long as you allow the
network to sit there.

472
00:24:47,670 --> 00:24:48,170
OK?

473
00:24:50,540 --> 00:24:52,040
All right, any
questions about that?

474
00:24:55,740 --> 00:24:56,627
Yes, Rebecca?

475
00:24:56,627 --> 00:24:57,294
AUDIENCE: Sorry.

476
00:24:57,294 --> 00:24:59,787
So the basin is just like
each side of that [INAUDIBLE]??

477
00:24:59,787 --> 00:25:00,870
MICHALE FEE: That's right.

478
00:25:00,870 --> 00:25:03,670
That's the basin of
attraction for this attractor.

479
00:25:03,670 --> 00:25:08,340
If you start the network
anywhere in this half plane,

480
00:25:08,340 --> 00:25:13,170
the network will evolve
toward that attractor.

481
00:25:13,170 --> 00:25:17,670
And you can use that as a
winner-take-all decision-making

482
00:25:17,670 --> 00:25:21,180
network by starting the
network right there at zero.

483
00:25:21,180 --> 00:25:24,510
And small kicks in
either direction

484
00:25:24,510 --> 00:25:29,100
will cause the network to relax
into one of these attractors

485
00:25:29,100 --> 00:25:30,450
and maintain that memory.

486
00:25:34,720 --> 00:25:40,660
Now let's talk about sort
of a formal implementation

487
00:25:40,660 --> 00:25:45,020
of a system for producing
memories, long-term memories,

488
00:25:45,020 --> 00:25:45,520
all right?

489
00:25:45,520 --> 00:25:47,530
And that's called
a Hopfield model.

490
00:25:47,530 --> 00:25:51,820
And the Hopfield
model is actually

491
00:25:51,820 --> 00:25:56,260
one of the best current
models for understanding

492
00:25:56,260 --> 00:25:59,950
how memory systems like
the hippocampus work.

493
00:25:59,950 --> 00:26:03,640
So the basic idea
is that we have

494
00:26:03,640 --> 00:26:07,420
neurons in the hippocampus, in
particular in the CA3 region

495
00:26:07,420 --> 00:26:11,200
of the hippocampus, that
have very prominent-- a lot

496
00:26:11,200 --> 00:26:14,740
of recurrent connectivity
between those neurons,

497
00:26:14,740 --> 00:26:15,290
all right?

498
00:26:15,290 --> 00:26:19,070
And so you have input
from entorhinal cortex

499
00:26:19,070 --> 00:26:22,130
and from the dentate
gyrus that sort of serve

500
00:26:22,130 --> 00:26:27,320
as the stimuli that come
into that network and form--

501
00:26:27,320 --> 00:26:30,950
and burn memories into
that part of the network

502
00:26:30,950 --> 00:26:34,630
by changing the synaptic
weights within that network.

503
00:26:34,630 --> 00:26:36,500
[INAUDIBLE] that
some time later,

504
00:26:36,500 --> 00:26:39,260
when similar inputs
come in, they

505
00:26:39,260 --> 00:26:43,040
can reactivate the memory
in the hippocampus.

506
00:26:43,040 --> 00:26:48,980
And you recognize and remember
that pattern of stimuli.

507
00:26:48,980 --> 00:26:50,810
All right, so we're going to--

508
00:26:50,810 --> 00:26:55,790
actually, so an example of
how this looks when you record

509
00:26:55,790 --> 00:26:58,590
neurons in the hippocampus,
it looks like this.

510
00:26:58,590 --> 00:27:04,400
So here's a mouse or a rat with
electrodes in its hippocampus.

511
00:27:04,400 --> 00:27:07,190
If you put it in a
little arena like this,

512
00:27:07,190 --> 00:27:10,190
it will run around and
explore for a while.

513
00:27:10,190 --> 00:27:14,840
You can record where the rat
is in that arena [AUDIO OUT]

514
00:27:14,840 --> 00:27:16,730
from neurons.

515
00:27:16,730 --> 00:27:21,740
And measure when the
neurons spike and look

516
00:27:21,740 --> 00:27:23,990
at how the firing
rate of those neurons

517
00:27:23,990 --> 00:27:25,920
relates to the
position of the animal.

518
00:27:25,920 --> 00:27:30,710
So the black trace here shows
all of the locations where

519
00:27:30,710 --> 00:27:33,350
the rat was when it was
running around the maze,

520
00:27:33,350 --> 00:27:37,250
and the red dot shows where
one of these neurons in CA3

521
00:27:37,250 --> 00:27:42,890
of the hippocampus generated a
spike, where the rat was when

522
00:27:42,890 --> 00:27:44,420
that neuron generates a spike.

523
00:27:44,420 --> 00:27:46,960
And those are shown
with red dots here.

524
00:27:46,960 --> 00:27:51,680
And you can see that
this neuron generates

525
00:27:51,680 --> 00:27:56,900
spiking when the animal is in
a particular restricted region

526
00:27:56,900 --> 00:28:00,410
of the cage, of its environment.

527
00:28:00,410 --> 00:28:05,040
And different neurons show
different localized regions.

528
00:28:05,040 --> 00:28:07,130
So these regions are
called place fields,

529
00:28:07,130 --> 00:28:10,430
because they are the places
in the environment where

530
00:28:10,430 --> 00:28:12,410
that neurons spikes.

531
00:28:12,410 --> 00:28:14,740
Different neurons have
different place fields.

532
00:28:14,740 --> 00:28:17,480
You can actually record
from many of these neurons--

533
00:28:17,480 --> 00:28:21,380
and looking at the pattern
of neurons that are spiking,

534
00:28:21,380 --> 00:28:25,370
you can actually figure
out where the rat was or is

535
00:28:25,370 --> 00:28:27,350
at any given moment,
just by looking at which

536
00:28:27,350 --> 00:28:28,632
of these neurons is spiking.

537
00:28:28,632 --> 00:28:29,840
That's pretty obvious, right?

538
00:28:29,840 --> 00:28:33,650
If this neuron is spiking
and this neuron isn't, all

539
00:28:33,650 --> 00:28:35,800
these other neurons,
then the animal

540
00:28:35,800 --> 00:28:38,540
is going to be-- you know
that the animal is somewhere

541
00:28:38,540 --> 00:28:40,430
in that location right there.

542
00:28:43,420 --> 00:28:47,140
All right, so in a sense,
the activity of these neurons

543
00:28:47,140 --> 00:28:51,850
reflects the animal remembering,
or sort of remembering,

544
00:28:51,850 --> 00:28:54,340
that it's in a
particular location.

545
00:28:54,340 --> 00:28:55,450
It's in a cage.

546
00:28:55,450 --> 00:28:57,670
It looks at the walls
of the environment.

547
00:28:57,670 --> 00:28:59,250
It sees a little--

548
00:28:59,250 --> 00:29:01,780
they use colored
cards on the wall

549
00:29:01,780 --> 00:29:03,850
to give the animal
cues as to where it is.

550
00:29:03,850 --> 00:29:07,158
So they look around and they
say, oh, yeah, I'm here.

551
00:29:07,158 --> 00:29:09,700
In my environment, there's a
red card there and a yellow card

552
00:29:09,700 --> 00:29:13,690
there, and that's
where I am right now.

553
00:29:13,690 --> 00:29:16,820
So that's the way you think
about these hippocampal place

554
00:29:16,820 --> 00:29:18,730
fields as being like a memory.

555
00:29:18,730 --> 00:29:23,290
On top of that, this
part of the hippocampus

556
00:29:23,290 --> 00:29:26,950
is necessary for the actual
formation of memories

557
00:29:26,950 --> 00:29:31,720
in a broader sense-- not
just spatial locations,

558
00:29:31,720 --> 00:29:36,520
but more generally in terms
of life events, right?

559
00:29:36,520 --> 00:29:40,690
For humans, the hippocampus
is an essential part

560
00:29:40,690 --> 00:29:42,130
of the brain for
storing memories.

561
00:29:45,340 --> 00:29:48,310
All right, so let's
come back to this idea

562
00:29:48,310 --> 00:29:49,822
of our recurrent network.

563
00:29:49,822 --> 00:29:51,280
And what we're
going to do is we're

564
00:29:51,280 --> 00:29:53,072
going to start adding
more and more neurons

565
00:29:53,072 --> 00:29:55,785
to our recurrent network.

566
00:29:55,785 --> 00:29:57,410
All right, so here's
what the attractor

567
00:29:57,410 --> 00:29:58,785
looked like for
the case where we

568
00:29:58,785 --> 00:30:02,180
have one eigenvalue in the
system that's greater than one,

569
00:30:02,180 --> 00:30:05,070
another one that's
less than one.

570
00:30:05,070 --> 00:30:08,070
If we now make both
of these neurons

571
00:30:08,070 --> 00:30:11,940
have recurrent connections
that are stronger than one,

572
00:30:11,940 --> 00:30:14,760
now we're going to have
four attractors, right?

573
00:30:14,760 --> 00:30:18,430
Each one of these has
two stable fixed points--

574
00:30:18,430 --> 00:30:20,790
a one and minus one.

575
00:30:20,790 --> 00:30:25,450
So here, for these
two states, v1 is one.

576
00:30:25,450 --> 00:30:28,530
And for these two
states, v1 is negative 1.

577
00:30:28,530 --> 00:30:31,260
For these two states, v2
is 1, and these two states,

578
00:30:31,260 --> 00:30:33,540
v2 is negative one, all right?

579
00:30:33,540 --> 00:30:38,460
So you can see every time
we add another neuron

580
00:30:38,460 --> 00:30:41,880
or another neuron to our
network that has an autapse,

581
00:30:41,880 --> 00:30:46,500
every time we add another
neuron with another eigenvalue,

582
00:30:46,500 --> 00:30:51,740
we add more possible
states of the network, OK?

583
00:30:51,740 --> 00:30:55,190
So if we had two neurons,
we have one neuron

584
00:30:55,190 --> 00:30:58,730
with an eigenvalue with an
autapse greater than one,

585
00:30:58,730 --> 00:30:59,660
we have two states.

586
00:30:59,660 --> 00:31:01,765
If we have two, we
have four states.

587
00:31:01,765 --> 00:31:05,130
If we have three of those,
we have eight states.

588
00:31:05,130 --> 00:31:09,090
So you can see that if we
have n of these neurons

589
00:31:09,090 --> 00:31:13,620
with recurrent excitation with
a lambda of greater than one,

590
00:31:13,620 --> 00:31:15,660
we have 2 to the
n possible states

591
00:31:15,660 --> 00:31:18,840
that that system can be in, OK?

592
00:31:18,840 --> 00:31:23,920
So I don't know exactly how
many neurons are in CA3.

593
00:31:23,920 --> 00:31:27,298
It has to be several
million, maybe 10 million.

594
00:31:27,298 --> 00:31:28,590
We don't know the exact number.

595
00:31:28,590 --> 00:31:33,500
But 2 to that is a lot of
possible states, right?

596
00:31:36,750 --> 00:31:40,120
So the problem is that--

597
00:31:40,120 --> 00:31:44,490
so let's think about how
this thing acts as a memory.

598
00:31:44,490 --> 00:31:47,940
So it turns out that this little
device that we've built here

599
00:31:47,940 --> 00:31:52,230
is actually a lot like
a computer memory.

600
00:31:52,230 --> 00:31:58,185
It's like a register,
where we can write a value.

601
00:32:00,970 --> 00:32:04,830
So we can write in
here a 1, minus 1, 1.

602
00:32:04,830 --> 00:32:08,350
And as long as we leave
that network alone,

603
00:32:08,350 --> 00:32:10,030
it will store that value.

604
00:32:12,560 --> 00:32:17,300
Or we can write a 1, 1, 1,
and it will store that value.

605
00:32:17,300 --> 00:32:19,900
But that's not really
what we mean when

606
00:32:19,900 --> 00:32:21,880
we talk about memories, right?

607
00:32:21,880 --> 00:32:27,350
We have a memory of meeting
somebody for lunch yesterday,

608
00:32:27,350 --> 00:32:27,850
right?

609
00:32:27,850 --> 00:32:35,980
That is a particular
configuration of sensory inputs

610
00:32:35,980 --> 00:32:37,150
that we experienced.

611
00:32:40,770 --> 00:32:43,760
So the other way to think about
this is this kind of network

612
00:32:43,760 --> 00:32:45,560
is just a short-term memory.

613
00:32:45,560 --> 00:32:47,270
We can program in some values--

614
00:32:47,270 --> 00:32:48,710
1, 1, 1.

615
00:32:48,710 --> 00:32:53,190
But if we were to turn the
activity of these neurons off,

616
00:32:53,190 --> 00:32:56,010
we'd erase the memory, right?

617
00:32:56,010 --> 00:32:59,020
How do we build
into this network

618
00:32:59,020 --> 00:33:01,710
a long-term memory,
something that we

619
00:33:01,710 --> 00:33:05,430
can turn all these
neurons off and then

620
00:33:05,430 --> 00:33:09,990
the network sort of goes back
into the remembered state?

621
00:33:09,990 --> 00:33:12,510
You do that by
building connections

622
00:33:12,510 --> 00:33:16,470
between these neurons,
such that only some

623
00:33:16,470 --> 00:33:19,560
of these possible
states are actually

624
00:33:19,560 --> 00:33:21,267
stable states, all right?

625
00:33:21,267 --> 00:33:22,850
So let me give you
an example of this.

626
00:33:22,850 --> 00:33:26,410
So if you have a whole
bunch of neurons--

627
00:33:26,410 --> 00:33:27,170
n neurons.

628
00:33:27,170 --> 00:33:29,545
You've got 2 to the
n possible states

629
00:33:29,545 --> 00:33:33,440
that that network can sit in.

630
00:33:33,440 --> 00:33:38,530
What we want is for
only some of those

631
00:33:38,530 --> 00:33:41,280
to actually be stable
states of the system.

632
00:33:44,280 --> 00:33:48,050
So, for example, when we
wake up in the morning

633
00:33:48,050 --> 00:33:56,040
and we see the dresser or maybe
the nightstand next to the bed,

634
00:33:56,040 --> 00:34:00,140
we want to remember
that's our bedroom.

635
00:34:00,140 --> 00:34:03,950
We want that to be a particular
configuration of inputs

636
00:34:03,950 --> 00:34:07,120
that we recall, right?

637
00:34:07,120 --> 00:34:10,900
So what you want is you
want a set of neurons

638
00:34:10,900 --> 00:34:14,949
that have particular states
that the system evolves

639
00:34:14,949 --> 00:34:20,780
toward that are stable
states of the system.

640
00:34:20,780 --> 00:34:22,449
So the way you do
that is you take

641
00:34:22,449 --> 00:34:27,290
this network with
recurrent autapses

642
00:34:27,290 --> 00:34:30,949
and you build cross-connections
between them that

643
00:34:30,949 --> 00:34:35,210
make particular of
those possible states

644
00:34:35,210 --> 00:34:36,860
actual stable states
of the system.

645
00:34:36,860 --> 00:34:40,830
We want to restrict the number
of stable states in the system.

646
00:34:40,830 --> 00:34:43,170
So take a look at
this network here.

647
00:34:43,170 --> 00:34:46,070
So here we have two neurons.

648
00:34:46,070 --> 00:34:50,659
You know that if you had
autapses between these--

649
00:34:50,659 --> 00:34:52,639
of these neurons to
themselves, there

650
00:34:52,639 --> 00:34:55,310
would be four possible
stable states.

651
00:34:55,310 --> 00:34:58,790
But if we now build
excitatory cross-connections

652
00:34:58,790 --> 00:35:01,700
between those neurons,
two of those states

653
00:35:01,700 --> 00:35:04,880
actually are no
longer stable states.

654
00:35:04,880 --> 00:35:07,330
They become unstable.

655
00:35:07,330 --> 00:35:12,410
And only these two remain
stable states of this system,

656
00:35:12,410 --> 00:35:13,550
remain attractors.

657
00:35:18,550 --> 00:35:22,120
If we put inhibitory connections
between those neurons,

658
00:35:22,120 --> 00:35:25,690
then we can make these
two states the attractors

659
00:35:25,690 --> 00:35:29,740
of the system, OK?

660
00:35:29,740 --> 00:35:30,665
All right.

661
00:35:30,665 --> 00:35:31,540
Does that make sense?

662
00:35:37,460 --> 00:35:40,700
All right, so let's
actually flesh out

663
00:35:40,700 --> 00:35:45,560
the mathematics of how you
take a network of neurons

664
00:35:45,560 --> 00:35:49,070
and program it to
have particular states

665
00:35:49,070 --> 00:35:51,920
that are tractors of
the system, all right?

666
00:35:51,920 --> 00:35:55,760
So we've been using this
kind of dynamical equation.

667
00:35:55,760 --> 00:35:57,920
We're going to simplify that.

668
00:35:57,920 --> 00:36:01,130
We're going to follow
the construction

669
00:36:01,130 --> 00:36:03,110
that John Hopfield
used when he analyzed

670
00:36:03,110 --> 00:36:05,930
these recurrent networks.

671
00:36:05,930 --> 00:36:11,270
And instead of writing
down a continuous update so

672
00:36:11,270 --> 00:36:14,480
that we update the-- in the
formulation we've been using,

673
00:36:14,480 --> 00:36:16,190
we update the firing
rate of our neuron

674
00:36:16,190 --> 00:36:18,710
using this
differential equation.

675
00:36:18,710 --> 00:36:21,890
We're going to simplify
it by just writing down

676
00:36:21,890 --> 00:36:25,770
the state of the network
at time t plus 1.

677
00:36:25,770 --> 00:36:27,920
That's a function of
the state of the network

678
00:36:27,920 --> 00:36:30,540
of the previous time step.

679
00:36:30,540 --> 00:36:33,060
So we're going to
discretize time.

680
00:36:33,060 --> 00:36:35,130
We're going to say v,
the state of the network,

681
00:36:35,130 --> 00:36:37,430
the firing rates of all the
neurons at time t plus 1,

682
00:36:37,430 --> 00:36:41,080
is just a function
of a weight matrix

683
00:36:41,080 --> 00:36:45,210
that connects all the neurons
times the state of the system,

684
00:36:45,210 --> 00:36:47,310
times the firing rate vector.

685
00:36:47,310 --> 00:36:50,970
And then this can also have
an input into it, all right?

686
00:36:54,480 --> 00:36:57,190
All right.

687
00:36:57,190 --> 00:37:00,190
And here, I'm just writing
out exactly what that matrix

688
00:37:00,190 --> 00:37:01,510
multiplication looks like.

689
00:37:01,510 --> 00:37:06,610
It's the state of the i-th
[vector?] after we update

690
00:37:06,610 --> 00:37:12,020
the state of the network is just
a sum over all of the different

691
00:37:12,020 --> 00:37:14,950
inputs coming from all
of the other neurons,

692
00:37:14,950 --> 00:37:18,920
all the j other neurons.

693
00:37:18,920 --> 00:37:22,420
And we're going to simplify
our neuronal activation

694
00:37:22,420 --> 00:37:25,850
function to just make it into
a binary threshold neuron.

695
00:37:25,850 --> 00:37:29,590
So if the input is positive,
then the firing rate of neuron

696
00:37:29,590 --> 00:37:30,418
will be positive.

697
00:37:30,418 --> 00:37:32,710
If the input is negative,
the firing rate of the neuron

698
00:37:32,710 --> 00:37:34,460
will be negative.

699
00:37:34,460 --> 00:37:35,200
All right?

700
00:37:35,200 --> 00:37:37,780
And that's the sine function.

701
00:37:37,780 --> 00:37:40,240
It's 1 if x is greater
than 0 and minus 1

702
00:37:40,240 --> 00:37:43,498
if x is less than or equal to 0.

703
00:37:43,498 --> 00:37:47,770
All right, so the goal is
to build a network that

704
00:37:47,770 --> 00:37:51,910
can store any memory we
want, any pattern we want,

705
00:37:51,910 --> 00:37:54,220
and turn that into
a stable state.

706
00:37:54,220 --> 00:37:57,160
So we're going to
build a network that

707
00:37:57,160 --> 00:38:01,630
will evolve toward a particular
pattern that we want.

708
00:38:01,630 --> 00:38:05,860
And xi is just a pattern
of ones and minus ones

709
00:38:05,860 --> 00:38:08,800
that describes that
memory that we're building

710
00:38:08,800 --> 00:38:12,100
into the network, all right?

711
00:38:12,100 --> 00:38:16,095
So xi is just a one or
minus one for every neuron

712
00:38:16,095 --> 00:38:16,720
in the network.

713
00:38:16,720 --> 00:38:19,960
So xi i is one or minus
one for the i-th neuron.

714
00:38:22,940 --> 00:38:27,530
Now, we want xi to be
an attractor, right?

715
00:38:27,530 --> 00:38:30,600
We want to build a network
such that xi is an attractor.

716
00:38:30,600 --> 00:38:32,930
And what that means is that--

717
00:38:32,930 --> 00:38:35,180
what does building
a network mean?

718
00:38:35,180 --> 00:38:39,830
When we say build a network,
what are we actually doing?

719
00:38:39,830 --> 00:38:44,822
What is it here that we're
actually trying to decide?

720
00:38:44,822 --> 00:38:46,060
AUDIENCE: The seminal roots.

721
00:38:46,060 --> 00:38:47,547
MICHALE FEE: Yeah, which is?

722
00:38:47,547 --> 00:38:48,770
AUDIENCE: Like the matrix M.

723
00:38:48,770 --> 00:38:49,930
MICHALE FEE: The M, right.

724
00:38:49,930 --> 00:38:52,300
So when I say build a
network that does this,

725
00:38:52,300 --> 00:38:57,210
I mean choose a set of M's
that has this property.

726
00:38:57,210 --> 00:39:02,270
So what we want is we want
to find a weight matrix M

727
00:39:02,270 --> 00:39:06,690
such that if the network
is in a stable state,

728
00:39:06,690 --> 00:39:09,730
is in this desired
state, that when

729
00:39:09,730 --> 00:39:13,630
we multiply that state
times the matrix M

730
00:39:13,630 --> 00:39:17,720
and we take the
sine of that sum,

731
00:39:17,720 --> 00:39:19,440
you're going to get
the same state back.

732
00:39:19,440 --> 00:39:21,470
In other words, you start
the network in this state,

733
00:39:21,470 --> 00:39:23,095
it's going to end up
in the same state.

734
00:39:23,095 --> 00:39:27,070
That's what it means to
have an attractor, OK?

735
00:39:27,070 --> 00:39:31,390
That's what it means to say
that it's a stable state.

736
00:39:31,390 --> 00:39:33,535
OK, so we're going to
try a particular matrix.

737
00:39:36,130 --> 00:39:38,970
And I'm going to describe
what this actually

738
00:39:38,970 --> 00:39:41,080
looks like in more detail.

739
00:39:41,080 --> 00:39:44,820
But the matrix that
programs a pattern

740
00:39:44,820 --> 00:39:48,180
xi into the network
as an attractor

741
00:39:48,180 --> 00:39:50,170
is this weight
matrix right here.

742
00:39:50,170 --> 00:39:53,790
So if we have a pattern
xi, our weight matrix

743
00:39:53,790 --> 00:39:58,370
is some constant times
the outer product

744
00:39:58,370 --> 00:40:01,170
of that pattern with itself.

745
00:40:01,170 --> 00:40:03,060
I'm going to explain
what that means.

746
00:40:03,060 --> 00:40:08,060
What that means is that
if neuron i and neuron

747
00:40:08,060 --> 00:40:12,540
j are both active
in this pattern,

748
00:40:12,540 --> 00:40:16,860
both have a firing
rate of one, then

749
00:40:16,860 --> 00:40:19,380
those two neurons are going
to be connected to each other,

750
00:40:19,380 --> 00:40:19,880
right?

751
00:40:19,880 --> 00:40:23,880
They're going to have a
connection between them that

752
00:40:23,880 --> 00:40:27,060
has a value of one, or alpha.

753
00:40:27,060 --> 00:40:30,340
If one of those neurons
has a firing rate of one

754
00:40:30,340 --> 00:40:33,790
and the other neuron has
a firing rate of zero,

755
00:40:33,790 --> 00:40:35,885
then what weight do
we want between them?

756
00:40:35,885 --> 00:40:37,510
If one of them has
a firing rate of one

757
00:40:37,510 --> 00:40:39,385
and the other has a
firing rate of minus one,

758
00:40:39,385 --> 00:40:41,620
the strength of the connection
we want between them

759
00:40:41,620 --> 00:40:42,385
is minus one.

760
00:40:42,385 --> 00:40:45,370
So if one neuron is active
and another neuron is active,

761
00:40:45,370 --> 00:40:47,770
we want them to excite
each other to maintain

762
00:40:47,770 --> 00:40:49,480
that as a stable state.

763
00:40:49,480 --> 00:40:52,510
If one neuron is plus and
the other one is minus,

764
00:40:52,510 --> 00:40:54,970
we want them to
inhibit each other,

765
00:40:54,970 --> 00:40:58,490
because that will make
that configuration stable.

766
00:40:58,490 --> 00:41:01,660
OK, notice that's
a symmetric matrix.

767
00:41:01,660 --> 00:41:05,430
So let's actually take our
dynamical equation that

768
00:41:05,430 --> 00:41:07,830
says how we go from
the state at time t

769
00:41:07,830 --> 00:41:13,380
to the state of time t plus 1
and put in this weight matrix

770
00:41:13,380 --> 00:41:20,862
and see whether this pattern
xi is actually a stable state.

771
00:41:20,862 --> 00:41:21,570
So let's do that,

772
00:41:21,570 --> 00:41:25,770
Let's take this M and stick
it in there, substitute it in.

773
00:41:25,770 --> 00:41:31,140
Notice this is a sum over j,
so we can pull the xi i out.

774
00:41:31,140 --> 00:41:36,300
And now, you see that
v at t plus 1 is this.

775
00:41:36,300 --> 00:41:40,800
And it's the sine of
a times xi i times

776
00:41:40,800 --> 00:41:43,550
the sum of j of xi j, xi k.

777
00:41:43,550 --> 00:41:46,320
Now, what is that?

778
00:41:46,320 --> 00:41:47,690
Any idea what that is?

779
00:41:47,690 --> 00:41:50,180
So the elements of xi are what?

780
00:41:50,180 --> 00:41:54,090
They're just ones or minus ones.

781
00:41:54,090 --> 00:42:00,320
So xi j times xi j has to be?

782
00:42:00,320 --> 00:42:01,002
AUDIENCE: One.

783
00:42:01,002 --> 00:42:01,710
MICHALE FEE: One.

784
00:42:01,710 --> 00:42:05,340
And we're summing
over n neurons.

785
00:42:05,340 --> 00:42:12,920
So this sum has to
have a value N. So

786
00:42:12,920 --> 00:42:15,540
you can see that the
state at time t plus 1--

787
00:42:15,540 --> 00:42:21,830
if we start to network
in this stored state,

788
00:42:21,830 --> 00:42:25,720
it's just this-- sine of a N xi.

789
00:42:25,720 --> 00:42:28,010
But a is positive.

790
00:42:28,010 --> 00:42:31,500
N is just a positive
integer, number of neurons.

791
00:42:31,500 --> 00:42:35,650
So this equal xi.

792
00:42:35,650 --> 00:42:38,560
So if we have this
weight matrix,

793
00:42:38,560 --> 00:42:42,610
we start to network
in that stored state,

794
00:42:42,610 --> 00:42:45,590
the state at the next time
step will be the same state.

795
00:42:45,590 --> 00:42:49,830
So it's a stable fixed point.

796
00:42:49,830 --> 00:42:52,790
All right, so let's just
go through an example.

797
00:42:52,790 --> 00:42:57,800
That is the prescription
for programming a memory

798
00:42:57,800 --> 00:42:59,930
into a Hopfield network, OK?

799
00:42:59,930 --> 00:43:01,730
And notice that it's just--

800
00:43:01,730 --> 00:43:05,090
it's essentially a
Hebbian learning rule.

801
00:43:05,090 --> 00:43:07,820
So the way you do this is
you activate the neurons

802
00:43:07,820 --> 00:43:12,370
with a particular pattern, and
any two neurons that are active

803
00:43:12,370 --> 00:43:16,840
together form a positive
excitatory connection

804
00:43:16,840 --> 00:43:18,040
between them.

805
00:43:18,040 --> 00:43:20,270
Any two neurons
where one is positive

806
00:43:20,270 --> 00:43:25,510
and the other is negative
form a symmetric inhibitory

807
00:43:25,510 --> 00:43:27,010
connection, all right?

808
00:43:34,853 --> 00:43:36,770
All right, so let's take
a particular example.

809
00:43:36,770 --> 00:43:40,280
Let's make a
three-neuron network that

810
00:43:40,280 --> 00:43:43,490
stores a pattern 1, 1, minus 1.

811
00:43:43,490 --> 00:43:46,340
And again, the notation
here is xi, xi transpose.

812
00:43:46,340 --> 00:43:48,830
That's an outer
product, just like you

813
00:43:48,830 --> 00:43:56,990
use to compute the covariance
matrix of a data matrix.

814
00:43:56,990 --> 00:44:00,530
So there's a pattern
we're going to program in.

815
00:44:00,530 --> 00:44:03,410
The weight matrix
is xi, xi transpose,

816
00:44:03,410 --> 00:44:07,650
but it's 1, 1, minus
1 times 1, 1, minus 1.

817
00:44:07,650 --> 00:44:09,650
You can see that's going
to give you this matrix

818
00:44:09,650 --> 00:44:10,490
here, all right?

819
00:44:10,490 --> 00:44:12,800
So that element
there is 1 times 1.

820
00:44:12,800 --> 00:44:14,010
That element there.

821
00:44:14,010 --> 00:44:16,860
So here are two neurons.

822
00:44:16,860 --> 00:44:22,170
These two neurons storing this
pattern, these two neurons--

823
00:44:22,170 --> 00:44:26,310
sorry, this neuron has a
firing rate of minus one.

824
00:44:26,310 --> 00:44:30,840
So the connection between
that neuron and itself

825
00:44:30,840 --> 00:44:34,220
is a one, right?

826
00:44:34,220 --> 00:44:36,660
It's just the product
of that times that.

827
00:44:36,660 --> 00:44:41,480
All right any questions about
how we got this weight matrix?

828
00:44:41,480 --> 00:44:45,040
I think it's pretty
straightforward.

829
00:44:45,040 --> 00:44:47,500
So is that a stable point?

830
00:44:47,500 --> 00:44:49,060
Let's just multiply it out.

831
00:44:49,060 --> 00:44:53,800
We take this vector and
multiply it by this matrix.

832
00:44:53,800 --> 00:44:55,540
There's our stored pattern.

833
00:44:55,540 --> 00:44:58,200
There's our matrix that
stores that pattern.

834
00:44:58,200 --> 00:44:59,950
And we're just going
to multiply this out.

835
00:44:59,950 --> 00:45:04,600
You can see that 1
times 1 plus 1 times 1

836
00:45:04,600 --> 00:45:07,120
plus minus 1 times minus 1 is 3.

837
00:45:07,120 --> 00:45:09,635
You just do that for
each of the neurons.

838
00:45:12,670 --> 00:45:13,970
Take the sine of that.

839
00:45:13,970 --> 00:45:16,345
And you can see that
that's just 1, 1, minus 1.

840
00:45:16,345 --> 00:45:20,170
So 1, 1, minus 1 is
a stable fixed point.

841
00:45:20,170 --> 00:45:22,690
Now let's see if it's
actually an attractor.

842
00:45:22,690 --> 00:45:26,380
So when a state is an
attractor, what that means is

843
00:45:26,380 --> 00:45:28,360
if we start to network
at a state that's

844
00:45:28,360 --> 00:45:32,530
a little bit different from
that and advance the network one

845
00:45:32,530 --> 00:45:36,490
time step, it will converge
toward the attractor.

846
00:45:36,490 --> 00:45:41,590
So into our network that stores
this pattern 1, 1, minus 1,

847
00:45:41,590 --> 00:45:45,590
let's put in a different
pattern and see what happens.

848
00:45:45,590 --> 00:45:47,470
So we're going to take
that weight matrix,

849
00:45:47,470 --> 00:45:52,750
multiply it by this initial
state, multiply it out,

850
00:45:52,750 --> 00:45:55,570
and you can see
that next state is

851
00:45:55,570 --> 00:46:00,040
going to be the sine
of 3, 3, minus 3.

852
00:46:00,040 --> 00:46:05,440
And one time step advanced,
the network is now in the state

853
00:46:05,440 --> 00:46:07,570
that we've programmed in.

854
00:46:07,570 --> 00:46:10,530
Does that make sense?

855
00:46:10,530 --> 00:46:16,272
So that state is a stable fixed
point and it's an attractor.

856
00:46:16,272 --> 00:46:18,230
I'm just going to go
through this very quickly.

857
00:46:18,230 --> 00:46:22,760
I'm just going to prove that xi
is an attractor of the network

858
00:46:22,760 --> 00:46:27,470
if we write down the network
as the outer product of this.

859
00:46:27,470 --> 00:46:30,470
The matrix elements
are the outer product

860
00:46:30,470 --> 00:46:32,360
of the stored state, OK?

861
00:46:32,360 --> 00:46:34,010
So what we're going
to do is we're

862
00:46:34,010 --> 00:46:37,770
going to calculate the total
input onto the i-th neuron

863
00:46:37,770 --> 00:46:43,530
if we start from an
arbitrary state, v. So k

864
00:46:43,530 --> 00:46:47,570
is the input to all
the neurons, right?

865
00:46:47,570 --> 00:46:52,880
And it's just that matrix
times the initial state.

866
00:46:52,880 --> 00:46:55,780
So v j is the firing
rate of the j-th neuron,

867
00:46:55,780 --> 00:47:00,520
and k is just M times v.
That's the pattern of inputs

868
00:47:00,520 --> 00:47:02,350
to all of our neurons.

869
00:47:02,350 --> 00:47:04,610
So what is that? k equals--

870
00:47:04,610 --> 00:47:06,490
we're just going to
put this weight matrix

871
00:47:06,490 --> 00:47:10,300
into this equation, all right?

872
00:47:10,300 --> 00:47:13,300
We can pull the xi i
outside of the sum,

873
00:47:13,300 --> 00:47:15,070
because it doesn't depend on j.

874
00:47:15,070 --> 00:47:17,300
The sum is over j.

875
00:47:17,300 --> 00:47:20,480
Now let's just write
out this sum, OK?

876
00:47:20,480 --> 00:47:22,670
Now, you can see
that if you start out

877
00:47:22,670 --> 00:47:27,470
with an initial state that has
some number of neurons that

878
00:47:27,470 --> 00:47:32,570
have the correct sign that
are already overlapping

879
00:47:32,570 --> 00:47:35,780
with the memorized state
and some number of neurons

880
00:47:35,780 --> 00:47:37,940
in that initial
state don't overlap

881
00:47:37,940 --> 00:47:40,640
with the memorized
state, we can write out

882
00:47:40,640 --> 00:47:42,770
this sum as two terms.

883
00:47:42,770 --> 00:47:47,000
We can write it as a sum
over some of the neurons that

884
00:47:47,000 --> 00:47:51,370
are already in the correct state
and a sum over neurons that

885
00:47:51,370 --> 00:47:53,080
are not in the correct state.

886
00:47:56,280 --> 00:47:59,490
So if these neurons
in that initial state

887
00:47:59,490 --> 00:48:02,790
have the right sign, that means
these two have the same sign.

888
00:48:02,790 --> 00:48:08,040
And so the sum over
xi j vj for neurons

889
00:48:08,040 --> 00:48:10,350
where v has the
right sign is just

890
00:48:10,350 --> 00:48:13,680
the number of neurons
that has the correct sign.

891
00:48:13,680 --> 00:48:16,320
And this sum over
incorrect neurons

892
00:48:16,320 --> 00:48:20,010
means these neurons have the
opposite sign of the desired

893
00:48:20,010 --> 00:48:20,730
memory.

894
00:48:20,730 --> 00:48:24,540
And so those will be one,
and those will be minus one.

895
00:48:24,540 --> 00:48:26,820
Or those will be minus
one, and those will be one.

896
00:48:26,820 --> 00:48:31,150
And so this will be minus the
number of incorrect neurons.

897
00:48:31,150 --> 00:48:33,660
So you can see that
the input of the neuron

898
00:48:33,660 --> 00:48:38,790
will have the right sign if
the number of correct neurons

899
00:48:38,790 --> 00:48:43,060
is more than the number of
incorrect neurons, all right?

900
00:48:43,060 --> 00:48:46,810
So what that means is that
if you program a pattern

901
00:48:46,810 --> 00:48:49,360
into this network
and then I drive

902
00:48:49,360 --> 00:48:58,050
an input into the network,
where most of the inputs drive--

903
00:48:58,050 --> 00:49:05,570
if the input drives most of the
neurons with the right sign,

904
00:49:05,570 --> 00:49:10,480
then the inputs will
cause the network

905
00:49:10,480 --> 00:49:15,580
to evolve toward the memorized
pattern in the next timestamp.

906
00:49:15,580 --> 00:49:17,830
OK, so let me say that again,
because I felt like that

907
00:49:17,830 --> 00:49:19,870
didn't come out very clearly.

908
00:49:19,870 --> 00:49:23,050
We program a pattern
into our network.

909
00:49:23,050 --> 00:49:27,160
If we start to network at some--

910
00:49:27,160 --> 00:49:28,300
let's say at zero.

911
00:49:28,300 --> 00:49:31,630
And then we put in a pattern
into the network such

912
00:49:31,630 --> 00:49:37,150
that just the majority
of the neurons

913
00:49:37,150 --> 00:49:42,490
are activated in a way that
looks like the stored pattern,

914
00:49:42,490 --> 00:49:44,920
then in the next time
step, all of the neurons

915
00:49:44,920 --> 00:49:46,973
will have this stored pattern.

916
00:49:46,973 --> 00:49:48,640
So let me show you
what that looks like.

917
00:49:52,210 --> 00:49:54,280
Let me actually go
ahead and show you--

918
00:50:00,930 --> 00:50:03,270
OK, so here's an
example of that.

919
00:50:03,270 --> 00:50:06,080
So you can use Hopfield
networks to store

920
00:50:06,080 --> 00:50:08,990
many different kinds of things,
including images, all right?

921
00:50:08,990 --> 00:50:11,630
So this is a network
where each pixel

922
00:50:11,630 --> 00:50:15,610
is being represented by a
neuron in a Hopfield network.

923
00:50:15,610 --> 00:50:20,330
And a particular image
was stored in that network

924
00:50:20,330 --> 00:50:24,800
by setting up the pattern
of synaptic weights

925
00:50:24,800 --> 00:50:29,900
just using that xi, xi transpose
learning rule for the weight

926
00:50:29,900 --> 00:50:31,700
matrix M, OK?

927
00:50:31,700 --> 00:50:34,370
Now, what you can do is you
can [INAUDIBLE] that network

928
00:50:34,370 --> 00:50:38,960
from a random initial condition.

929
00:50:38,960 --> 00:50:43,160
And then let the network
evolve over time, all right?

930
00:50:43,160 --> 00:50:46,880
And what you see is that
the network converges

931
00:50:46,880 --> 00:50:55,155
toward the pattern that was
stored in the synaptic [?], OK?

932
00:50:55,155 --> 00:50:56,030
Does that make sense?

933
00:51:01,190 --> 00:51:03,520
Got that?

934
00:51:03,520 --> 00:51:11,226
So, basically, as long
as that initial pattern

935
00:51:11,226 --> 00:51:14,820
has some overlap with
the stored pattern,

936
00:51:14,820 --> 00:51:17,120
the network will evolve
toward the stored pattern.

937
00:51:23,710 --> 00:51:26,080
All right, so let me
define a little bit

938
00:51:26,080 --> 00:51:29,975
better what we mean by
the energy landscape

939
00:51:29,975 --> 00:51:31,225
and how it's actually defined.

940
00:51:33,770 --> 00:51:37,360
OK, so you remember that
if we start our network

941
00:51:37,360 --> 00:51:41,870
in a particular pattern v,
the recurrent connections

942
00:51:41,870 --> 00:51:47,840
will drive inputs into all
the neurons in the network.

943
00:51:47,840 --> 00:51:49,700
And those inputs
will then determine

944
00:51:49,700 --> 00:51:53,280
the pattern of activity
at the next time step.

945
00:51:53,280 --> 00:51:58,030
So if we have a state
of the network v,

946
00:51:58,030 --> 00:52:02,110
the inputs to the network, to
all the neurons in the network,

947
00:52:02,110 --> 00:52:04,330
from the currently
active neurons

948
00:52:04,330 --> 00:52:09,400
is given by the
connection matrix times v.

949
00:52:09,400 --> 00:52:12,670
So we can just write that
out as a sum like this.

950
00:52:12,670 --> 00:52:19,490
So you define the energy of the
network as the dot product--

951
00:52:19,490 --> 00:52:21,830
basically, the
amount of overlap--

952
00:52:21,830 --> 00:52:26,480
between the current
state of the network

953
00:52:26,480 --> 00:52:31,200
and the inputs to
all of the neurons

954
00:52:31,200 --> 00:52:35,270
that drive the activity
in the next step, OK?

955
00:52:35,270 --> 00:52:37,610
And the energy is minus, OK?

956
00:52:37,610 --> 00:52:42,110
So what that means is if the
network is in a state that

957
00:52:42,110 --> 00:52:47,450
has a big overlap with the
pattern of inputs to all

958
00:52:47,450 --> 00:52:50,300
the other neurons,
then the energy will

959
00:52:50,300 --> 00:52:51,860
be very negative, right?

960
00:52:51,860 --> 00:52:55,940
And remember, the system likes
to evolve toward low energies.

961
00:52:55,940 --> 00:52:58,490
In physics, you have
a ball on a hill.

962
00:52:58,490 --> 00:53:04,150
It rolls downhill, right, to
lower gravitational energies.

963
00:53:04,150 --> 00:53:06,820
So you start the ball
anywhere on the hill,

964
00:53:06,820 --> 00:53:08,080
and it will roll downhill.

965
00:53:08,080 --> 00:53:10,000
So these networks
do the same thing.

966
00:53:10,000 --> 00:53:14,050
They evolve downward
on this energy surface.

967
00:53:14,050 --> 00:53:18,280
They evolve towards
states that have

968
00:53:18,280 --> 00:53:23,440
a high overlap with the inputs
that drive the next state.

969
00:53:23,440 --> 00:53:24,730
Does that make sense?

970
00:53:24,730 --> 00:53:31,600
So if you're in a state
where the pattern right now

971
00:53:31,600 --> 00:53:33,910
has a high overlap with
what the pattern is going

972
00:53:33,910 --> 00:53:37,540
to be in the next time step,
then you're in an attractor,

973
00:53:37,540 --> 00:53:38,040
right?

974
00:53:41,950 --> 00:53:45,050
OK, so it looks like that.

975
00:53:45,050 --> 00:53:48,830
So this energy is just
negative of the overlap

976
00:53:48,830 --> 00:53:51,860
of the current state of the
network with the pattern

977
00:53:51,860 --> 00:53:53,240
of inputs to all the neurons.

978
00:53:53,240 --> 00:53:54,421
Yes, Rebecca?

979
00:53:54,421 --> 00:53:57,790
AUDIENCE: So [INAUDIBLE] to
say [INAUDIBLE] with the weight

980
00:53:57,790 --> 00:54:00,328
matrix, since that's sort of
the goal of the next time step,

981
00:54:00,328 --> 00:54:02,820
and it will evolve towards
the matrix [INAUDIBLE]??

982
00:54:02,820 --> 00:54:03,570
MICHALE FEE: Yeah.

983
00:54:03,570 --> 00:54:07,470
So the only difference is
that the state of the network

984
00:54:07,470 --> 00:54:09,750
is this vector, right?

985
00:54:09,750 --> 00:54:14,460
And the weight matrix tells us
how that state will drive input

986
00:54:14,460 --> 00:54:16,980
into all the other neurons.

987
00:54:16,980 --> 00:54:23,050
And so if you're in a state
that drives a pattern of inputs

988
00:54:23,050 --> 00:54:27,160
to all the neurons that looks
exactly like the current state,

989
00:54:27,160 --> 00:54:31,190
then you're going to stay
in that state, right?

990
00:54:31,190 --> 00:54:34,660
And so the energy is just
defined as that dot product,

991
00:54:34,660 --> 00:54:38,140
the overlap of the current
state, or the state

992
00:54:38,140 --> 00:54:40,150
that you're calculating
the energy of,

993
00:54:40,150 --> 00:54:44,515
and the inputs to the network
in the next time step.

994
00:54:44,515 --> 00:54:46,640
All right, so let me show
you what that looks like.

995
00:54:49,180 --> 00:54:52,130
And so the energy is
lowest, current state

996
00:54:52,130 --> 00:54:54,140
has a high overlap
with the synaptic drive

997
00:54:54,140 --> 00:54:55,820
to the next step.

998
00:54:55,820 --> 00:54:59,190
So let's just take a look at
this particular network here.

999
00:54:59,190 --> 00:55:01,410
I've rewritten this
dot product as--

1000
00:55:01,410 --> 00:55:04,700
so k is just M times
v. This dot product

1001
00:55:04,700 --> 00:55:09,830
can just be written as
v transpose times Mv.

1002
00:55:09,830 --> 00:55:11,490
So that's the energy.

1003
00:55:11,490 --> 00:55:15,400
Let's take a look at this
matrix, this network here--

1004
00:55:15,400 --> 00:55:17,030
0, minus 2, minus 2, 0.

1005
00:55:17,030 --> 00:55:18,980
So it's this mutually
inhibitory network.

1006
00:55:18,980 --> 00:55:21,500
You know that that
inhibitory network

1007
00:55:21,500 --> 00:55:30,220
has attractors that are here
at minus 1, 1 and 1, minus 1.

1008
00:55:30,220 --> 00:55:31,990
So let's actually
calculate the energy.

1009
00:55:31,990 --> 00:55:35,130
So you can actually
take these states--

1010
00:55:35,130 --> 00:55:38,670
1, minus 1-- multiply
it by that M,

1011
00:55:38,670 --> 00:55:41,325
and then take the dot
product with 1, minus 1.

1012
00:55:41,325 --> 00:55:43,920
And do that for each
one of those states

1013
00:55:43,920 --> 00:55:45,080
and write down the energy.

1014
00:55:45,080 --> 00:55:49,160
You can see that the
energy here is minus 1.

1015
00:55:49,160 --> 00:55:53,100
The energy here is minus 1,
and the energy here is 0.

1016
00:55:53,100 --> 00:55:57,240
So if you start the network
here, at an energy zero,

1017
00:55:57,240 --> 00:56:01,310
it's going to roll
downhill to this state.

1018
00:56:05,920 --> 00:56:07,910
Or it can roll
downhill to this state,

1019
00:56:07,910 --> 00:56:14,030
depending on the
initial condition, OK?

1020
00:56:18,680 --> 00:56:24,500
So you can also think about the
energy as a function of firing

1021
00:56:24,500 --> 00:56:25,580
rates continuously.

1022
00:56:25,580 --> 00:56:28,730
You can calculate that energy,
not just for these points

1023
00:56:28,730 --> 00:56:30,560
on this grid.

1024
00:56:30,560 --> 00:56:33,260
And what you see is
that there's basically--

1025
00:56:33,260 --> 00:56:35,210
in high dimensions,
there are sort

1026
00:56:35,210 --> 00:56:39,710
of valleys that
describe the attractor

1027
00:56:39,710 --> 00:56:43,610
basin of these different
attractors, all right?

1028
00:56:43,610 --> 00:56:48,350
And if you project that energy
along an axis like this,

1029
00:56:48,350 --> 00:56:52,550
you can see that you sort of--

1030
00:56:52,550 --> 00:56:55,340
let's say, take a slice
through this energy function.

1031
00:56:55,340 --> 00:56:58,580
You can see that this
looks just like the energy

1032
00:56:58,580 --> 00:57:01,190
surface, the energy function,
that we described before

1033
00:57:01,190 --> 00:57:06,820
for the 1D factor, the single
neuron with two attractors,

1034
00:57:06,820 --> 00:57:07,320
right?

1035
00:57:07,320 --> 00:57:10,980
This corresponds to
a valley and a valley

1036
00:57:10,980 --> 00:57:13,200
and a peak between them.

1037
00:57:13,200 --> 00:57:16,470
And then the energy gets
big outside of that.

1038
00:57:16,470 --> 00:57:20,115
And questions about that?

1039
00:57:20,115 --> 00:57:21,070
Yes, [INAUDIBLE].

1040
00:57:21,070 --> 00:57:26,680
AUDIENCE: [INAUDIBLE] vector 1/2
because-- in this case, right?

1041
00:57:26,680 --> 00:57:31,090
MICHALE FEE: That's the general
definition, minus 1/2 v dot k.

1042
00:57:37,390 --> 00:57:38,770
It actually doesn't really--

1043
00:57:38,770 --> 00:57:40,460
this 1/2 doesn't really matter.

1044
00:57:40,460 --> 00:57:45,190
It actually comes out of
the derivative of something,

1045
00:57:45,190 --> 00:57:45,930
as I recall.

1046
00:57:45,930 --> 00:57:47,830
But a scaling factor
doesn't matter.

1047
00:57:47,830 --> 00:57:51,550
The network always evolves
toward a minimum of the energy.

1048
00:57:51,550 --> 00:57:55,060
And so this 1/2
could be anything.

1049
00:57:58,170 --> 00:58:04,680
All right, so the point is that
starting the network anywhere

1050
00:58:04,680 --> 00:58:09,300
with a sensory input, the system
will evolve toward the nearest

1051
00:58:09,300 --> 00:58:10,580
memory, OK?

1052
00:58:14,530 --> 00:58:15,780
And I already showed you this.

1053
00:58:15,780 --> 00:58:19,230
OK, so now, a very
interesting question

1054
00:58:19,230 --> 00:58:23,500
is, how many memories can you
actually store in a network?

1055
00:58:23,500 --> 00:58:28,140
And there's a very simple way
of calculating the capacity

1056
00:58:28,140 --> 00:58:29,680
of the Hopfield network.

1057
00:58:29,680 --> 00:58:32,540
And I'm just going to show
you the outlines of it.

1058
00:58:32,540 --> 00:58:35,220
And that actually
gives us some insight

1059
00:58:35,220 --> 00:58:39,150
into what kinds of
memories you can store.

1060
00:58:39,150 --> 00:58:42,630
Basically, the idea is that
when you store memories

1061
00:58:42,630 --> 00:58:44,850
in a network, you want
the different memories

1062
00:58:44,850 --> 00:58:48,267
to be as uncorrelated with
each other as possible.

1063
00:58:48,267 --> 00:58:50,100
You don't want to try
to store memories that

1064
00:58:50,100 --> 00:58:53,690
are very similar to each other.

1065
00:58:53,690 --> 00:58:57,910
And you'll see why in a second
when we look at the map.

1066
00:58:57,910 --> 00:59:00,760
So let's say that we want
to store multiple memories

1067
00:59:00,760 --> 00:59:02,510
in our network.

1068
00:59:02,510 --> 00:59:06,100
So instead of just
storing one pattern, xi,

1069
00:59:06,100 --> 00:59:09,470
we want to store a bunch
of different patterns xi.

1070
00:59:09,470 --> 00:59:12,530
And so let's say we're going
to store P different patterns.

1071
00:59:12,530 --> 00:59:15,730
So we have a
parameter variable mu.

1072
00:59:15,730 --> 00:59:19,670
An index mu addresses each
of the different patterns

1073
00:59:19,670 --> 00:59:20,920
we want to store.

1074
00:59:20,920 --> 00:59:25,730
So we're going to store zero to
p patterns, p minus 1 patterns.

1075
00:59:25,730 --> 00:59:28,120
So what we do, the
way we do that is

1076
00:59:28,120 --> 00:59:31,330
we compute the
contribution to the weight

1077
00:59:31,330 --> 00:59:34,520
make matrix from each of
those different patterns.

1078
00:59:34,520 --> 00:59:38,440
So we calculate a weight
matrix using the outer product

1079
00:59:38,440 --> 00:59:42,440
for each of the patterns we
want to store in the network,

1080
00:59:42,440 --> 00:59:43,130
all right?

1081
00:59:43,130 --> 00:59:46,420
And then we add all
of those together.

1082
00:59:46,420 --> 00:59:54,280
We're going to essentially sort
of average together the network

1083
00:59:54,280 --> 00:59:59,050
that we would make for
each pattern separately.

1084
00:59:59,050 --> 01:00:00,480
Does that makes sense?

1085
01:00:00,480 --> 01:00:05,120
So there is the equation
for the weight matrix

1086
01:00:05,120 --> 01:00:10,060
that stores p different patterns
in our memory, in our network.

1087
01:00:13,870 --> 01:00:16,690
And that's how we got
this kind of network

1088
01:00:16,690 --> 01:00:20,220
here, where we store
multiple memories, all right?

1089
01:00:22,920 --> 01:00:25,170
So let me just show
you an example of what

1090
01:00:25,170 --> 01:00:26,260
happens when you do that.

1091
01:00:26,260 --> 01:00:28,260
So I found these
nice videos online.

1092
01:00:28,260 --> 01:00:33,540
So here is a representation
of a network that stores

1093
01:00:33,540 --> 01:00:38,310
a five by five array of pixels.

1094
01:00:38,310 --> 01:00:42,740
And this network was trained on
these three different patterns.

1095
01:00:42,740 --> 01:00:45,380
And what this
little demo shows is

1096
01:00:45,380 --> 01:00:48,740
that if you start the network
from different configurations

1097
01:00:48,740 --> 01:00:52,010
here and then evolve the
network-- you start running it.

1098
01:00:52,010 --> 01:00:55,760
That means you run the
dynamic update for each neuron

1099
01:00:55,760 --> 01:00:58,010
one at a time, and
you can see how

1100
01:00:58,010 --> 01:00:59,380
this system evolves over time.

1101
01:01:03,500 --> 01:01:05,630
So this is a little
GUI-based thing.

1102
01:01:05,630 --> 01:01:08,460
You can flip the
state and then run it.

1103
01:01:08,460 --> 01:01:13,532
And you can see that if
you change those, now it--

1104
01:01:19,940 --> 01:01:22,680
I think he was trying to
make it look like that.

1105
01:01:22,680 --> 01:01:27,390
But when you run it, it actually
evolved toward this one.

1106
01:01:33,750 --> 01:01:36,570
He's going to really
make it look like that.

1107
01:01:36,570 --> 01:01:40,010
And you can see it
evolves toward that one.

1108
01:01:40,010 --> 01:01:42,020
All right, any
questions about that?

1109
01:01:42,020 --> 01:01:43,940
You can see it stored
three separate memories.

1110
01:01:43,940 --> 01:01:47,300
You've given an
input, and the network

1111
01:01:47,300 --> 01:01:51,540
evolves toward whatever memory
was closest to the input.

1112
01:01:51,540 --> 01:01:54,140
So that's called a content
[INAUDIBLE] memory.

1113
01:01:54,140 --> 01:01:56,660
You can actually
recall a memory--

1114
01:01:56,660 --> 01:02:00,050
not by pointing to an address,
like you do in a computer,

1115
01:02:00,050 --> 01:02:02,540
but by putting in
something that looks

1116
01:02:02,540 --> 01:02:04,640
a little bit like the memory.

1117
01:02:04,640 --> 01:02:09,080
And then the system
evolves right to the memory

1118
01:02:09,080 --> 01:02:12,660
that was closest to the input.

1119
01:02:12,660 --> 01:02:17,010
So it's also called an
auto-associative memory.

1120
01:02:17,010 --> 01:02:21,120
It automatically associates
with the nearest--

1121
01:02:21,120 --> 01:02:24,660
with a pattern that's
nearest to the input.

1122
01:02:24,660 --> 01:02:26,910
So here's another example.

1123
01:02:26,910 --> 01:02:29,340
It's just kind of
more of the same.

1124
01:02:29,340 --> 01:02:32,400
This is a network
similar to this.

1125
01:02:32,400 --> 01:02:34,930
Instead of black and
white, it's red and purple,

1126
01:02:34,930 --> 01:02:37,770
but it's got a lot more pixels.

1127
01:02:37,770 --> 01:02:41,010
And you'll see the
three different images

1128
01:02:41,010 --> 01:02:44,700
that are stored in there--

1129
01:02:44,700 --> 01:02:48,090
so a face, a world,
and a penguin.

1130
01:02:48,090 --> 01:02:51,090
So then what they're doing
here is they add noise.

1131
01:02:51,090 --> 01:02:53,580
And then you run the
network, and it [AUDIO OUT]

1132
01:02:53,580 --> 01:02:55,540
one of the patterns
that you stored in it.

1133
01:03:00,960 --> 01:03:02,200
So here's the penguin.

1134
01:03:02,200 --> 01:03:03,750
Add noise.

1135
01:03:03,750 --> 01:03:06,140
Add a little bit of noise.

1136
01:03:06,140 --> 01:03:10,200
Here, he's coloring it
in, I guess, to make it.

1137
01:03:10,200 --> 01:03:12,260
And then you run
the network, and it

1138
01:03:12,260 --> 01:03:13,894
remembers the [AUDIO OUT].

1139
01:03:18,010 --> 01:03:19,710
OK, so that's interesting.

1140
01:03:19,710 --> 01:03:21,930
So he ran it.

1141
01:03:21,930 --> 01:03:23,540
He or she ran the network.

1142
01:03:23,540 --> 01:03:27,350
And you see that it kind
of recovered a face,

1143
01:03:27,350 --> 01:03:31,160
but there's some penguin
head stuck on top.

1144
01:03:31,160 --> 01:03:32,840
So what goes wrong there?

1145
01:03:32,840 --> 01:03:36,010
Something bad happened, right?

1146
01:03:36,010 --> 01:03:40,490
The network was trained with a
face, a globe, and a penguin.

1147
01:03:40,490 --> 01:03:42,920
And you run it most of
the time, and it works.

1148
01:03:42,920 --> 01:03:46,280
And then, suddenly, you run
it, and it recovers a face

1149
01:03:46,280 --> 01:03:48,110
with a penguin head
sticking out of it.

1150
01:03:48,110 --> 01:03:50,390
What happened?

1151
01:03:50,390 --> 01:03:51,640
So we'll explain what happens.

1152
01:03:51,640 --> 01:03:55,540
What happened was
that that this network

1153
01:03:55,540 --> 01:03:58,660
was trained in a
way that has what's

1154
01:03:58,660 --> 01:04:00,770
called a spurious attractor.

1155
01:04:00,770 --> 01:04:02,860
And that often happens
when you train a network

1156
01:04:02,860 --> 01:04:05,170
with too many memories,
when you exceed

1157
01:04:05,170 --> 01:04:07,810
the capacity of the
network to store memories.

1158
01:04:07,810 --> 01:04:11,510
So let me show you what actually
goes wrong mathematically

1159
01:04:11,510 --> 01:04:12,010
there.

1160
01:04:17,970 --> 01:04:20,580
All right, so we're going
to do the same analysis

1161
01:04:20,580 --> 01:04:21,330
we did before.

1162
01:04:21,330 --> 01:04:24,030
We're going to take a matrix.

1163
01:04:24,030 --> 01:04:29,130
We're going to build a network
that stores multiple memories.

1164
01:04:31,890 --> 01:04:33,887
This was the matrix
to build one memory.

1165
01:04:33,887 --> 01:04:34,970
Let's see what I did here.

1166
01:04:39,670 --> 01:04:41,620
So in order for--

1167
01:04:45,190 --> 01:04:45,820
Yeah.

1168
01:04:45,820 --> 01:04:46,320
Sorry.

1169
01:04:46,320 --> 01:04:48,630
This was the matrix
for multiple memories.

1170
01:04:48,630 --> 01:04:50,130
We're summing mu.

1171
01:04:50,130 --> 01:04:53,553
I just didn't write
the mu equals 0 to p.

1172
01:04:53,553 --> 01:04:55,470
So we're going to program
p different memories

1173
01:04:55,470 --> 01:04:59,700
by summing up this outer product
for all the different patterns

1174
01:04:59,700 --> 01:05:02,460
that we're wanting
to store, all right?

1175
01:05:02,460 --> 01:05:06,570
We're going to ask
whether one of those--

1176
01:05:06,570 --> 01:05:11,580
under what conditions is one
of those patterns, the xi 0,

1177
01:05:11,580 --> 01:05:15,930
actually a stable
state of the network?

1178
01:05:15,930 --> 01:05:17,580
So we're going to
build a network

1179
01:05:17,580 --> 01:05:19,860
with multiple patterns
stored, and we're just

1180
01:05:19,860 --> 01:05:21,870
going to ask a simple question.

1181
01:05:21,870 --> 01:05:28,040
Under what conditions is xi
0 going to evolve to xi 0?

1182
01:05:28,040 --> 01:05:32,540
And if xi 0 evolves toward
xi 0, or stays at xi 0,

1183
01:05:32,540 --> 01:05:35,060
then it's a stable point.

1184
01:05:35,060 --> 01:05:36,270
All right, so let's do that.

1185
01:05:36,270 --> 01:05:37,895
We're going to take
that weight matrix,

1186
01:05:37,895 --> 01:05:41,240
and we're going to plug-in
our multiple memory weight

1187
01:05:41,240 --> 01:05:43,520
matrix, all right?

1188
01:05:43,520 --> 01:05:48,080
You can see that we
can pull out the xi

1189
01:05:48,080 --> 01:05:53,050
i out of this sum over j.

1190
01:05:53,050 --> 01:05:56,920
And the next step is we're
going to separate this

1191
01:05:56,920 --> 01:06:02,500
into a sum over mu equals
zero and a separate sum for mu

1192
01:06:02,500 --> 01:06:04,170
not equal to 0, all right?

1193
01:06:04,170 --> 01:06:08,140
So this is a sum
over all the mu's,

1194
01:06:08,140 --> 01:06:10,090
but we're going to
pull out the mu zero

1195
01:06:10,090 --> 01:06:13,650
term as a separate sum over j.

1196
01:06:13,650 --> 01:06:15,040
Is that clear?

1197
01:06:15,040 --> 01:06:18,010
Anyway, this is just for fun.

1198
01:06:18,010 --> 01:06:21,310
You don't have to reproduce
this, so don't worry.

1199
01:06:24,270 --> 01:06:27,280
So we're going to pull out
the mu equals zero term.

1200
01:06:27,280 --> 01:06:28,660
And what does that look like?

1201
01:06:28,660 --> 01:06:35,080
It's xi i0, sum over
j of xi j0, xi j0.

1202
01:06:35,080 --> 01:06:37,380
So what is that?

1203
01:06:37,380 --> 01:06:40,260
That's just N, right,
the number of neurons.

1204
01:06:40,260 --> 01:06:43,860
We're summing over j equals
1 to N, number of neurons.

1205
01:06:43,860 --> 01:06:45,990
I should add those limits here.

1206
01:06:45,990 --> 01:06:54,220
So you can see that that's N.
So this is just sine of xi i0

1207
01:06:54,220 --> 01:06:57,930
plus a bunch of other stuff.

1208
01:06:57,930 --> 01:07:01,950
So you can see right away that
if all of this other stuff

1209
01:07:01,950 --> 01:07:05,530
is really small, then
this is a fixed point.

1210
01:07:05,530 --> 01:07:08,370
Because if all this
stuff is small,

1211
01:07:08,370 --> 01:07:12,270
the system will evolve toward
the sine of xi [INAUDIBLE],,

1212
01:07:12,270 --> 01:07:15,000
which is just xi i0.

1213
01:07:15,000 --> 01:07:16,710
So let's take a look
at all of this stuff

1214
01:07:16,710 --> 01:07:22,970
and see what can go wrong
to make this not small.

1215
01:07:22,970 --> 01:07:25,490
All right, so let's zoom in
on this particular term right

1216
01:07:25,490 --> 01:07:25,990
here.

1217
01:07:25,990 --> 01:07:27,170
So what is this?

1218
01:07:27,170 --> 01:07:33,150
This is sum over
j, xi mu j, xi 0 j.

1219
01:07:33,150 --> 01:07:34,770
So what is that?

1220
01:07:34,770 --> 01:07:35,940
Anybody know what that is?

1221
01:07:39,030 --> 01:07:40,860
It's a vector operation.

1222
01:07:40,860 --> 01:07:41,870
What is that?

1223
01:07:41,870 --> 01:07:44,870
AUDIENCE: The dot
product between one image

1224
01:07:44,870 --> 01:07:45,870
and then zero.

1225
01:07:45,870 --> 01:07:47,110
MICHALE FEE: Exactly.

1226
01:07:47,110 --> 01:07:50,850
It's a dot product between
the image that we're asking

1227
01:07:50,850 --> 01:07:54,900
is it a stable fixed point
and all the other images

1228
01:07:54,900 --> 01:07:56,830
in the network.

1229
01:07:56,830 --> 01:07:59,045
Sorry, and the mu-th image.

1230
01:08:01,760 --> 01:08:07,130
So what this is saying
is that if our image is

1231
01:08:07,130 --> 01:08:10,670
orthogonal to all the
other images in the network

1232
01:08:10,670 --> 01:08:14,375
that we've tried to store,
then this thing is zero.

1233
01:08:21,810 --> 01:08:23,819
So this is referred
to as crosstalk

1234
01:08:23,819 --> 01:08:26,430
between the stored memories.

1235
01:08:26,430 --> 01:08:30,930
So if our pattern, xi
0, is orthogonal to all

1236
01:08:30,930 --> 01:08:33,550
the other patterns, then
it will be a fixed point.

1237
01:08:33,550 --> 01:08:37,500
So the capacity of the
network, the crosstalk--

1238
01:08:37,500 --> 01:08:40,290
the capacity of
the network depends

1239
01:08:40,290 --> 01:08:43,470
on how much overlap there is
between our stored pattern

1240
01:08:43,470 --> 01:08:46,200
and all the other patterns
in the network, all right?

1241
01:08:50,757 --> 01:08:52,340
So if all the memories
are orthogonal,

1242
01:08:52,340 --> 01:08:53,979
if all the patterns
are orthogonal,

1243
01:08:53,979 --> 01:08:57,670
then they're all
stable attractors.

1244
01:08:57,670 --> 01:09:01,510
But if one of those memories,
xi 1-- let's take xi 1--

1245
01:09:01,510 --> 01:09:07,000
is close to xi 0,
then xi 0 dot xi 1--

1246
01:09:07,000 --> 01:09:09,189
the two patterns
are very similar--

1247
01:09:09,189 --> 01:09:13,220
then the dot product is
going to be N, right?

1248
01:09:13,220 --> 01:09:16,200
And when you plug
that, if that's N,

1249
01:09:16,200 --> 01:09:22,390
then you can see that this
becomes xi 1 i, right?

1250
01:09:22,390 --> 01:09:27,850
So what happens is that
these other memories that

1251
01:09:27,850 --> 01:09:30,234
are similar to our
memorized pattern--

1252
01:09:33,010 --> 01:09:36,708
then when you sum that,
when you compute that sum,

1253
01:09:36,708 --> 01:09:38,649
some of these terms
get big enough so

1254
01:09:38,649 --> 01:09:44,560
that the memory in the next
step is not that stored memory.

1255
01:09:44,560 --> 01:09:47,479
It's a combination.

1256
01:09:47,479 --> 01:09:48,399
All right?

1257
01:09:48,399 --> 01:09:52,319
So what happens is-- so the
way the capacity of the network

1258
01:09:52,319 --> 01:09:53,149
is stored.

1259
01:09:53,149 --> 01:09:57,180
So you can't actually choose all
your memories to be orthogonal.

1260
01:09:57,180 --> 01:10:00,720
But a pretty good way of making
memory is nearly orthogonal

1261
01:10:00,720 --> 01:10:03,750
is to store them as
random [AUDIO OUT]..

1262
01:10:03,750 --> 01:10:08,580
So a lot of the
thinking that goes

1263
01:10:08,580 --> 01:10:10,950
into how you would
build a network that

1264
01:10:10,950 --> 01:10:14,730
stores a lot of patterns
is to take your memories

1265
01:10:14,730 --> 01:10:17,970
and sort of convert them in a
way that makes them maximally

1266
01:10:17,970 --> 01:10:20,190
orthogonal to each other.

1267
01:10:20,190 --> 01:10:22,710
You can use things
like lateral inhibition

1268
01:10:22,710 --> 01:10:26,800
to orthogonalize
different inputs.

1269
01:10:26,800 --> 01:10:31,060
So once you make your
patterns sort of noisy,

1270
01:10:31,060 --> 01:10:32,710
then it turns out
you can actually

1271
01:10:32,710 --> 01:10:36,070
calculate that if
the values of xi

1272
01:10:36,070 --> 01:10:38,200
sort of look like
random numbers,

1273
01:10:38,200 --> 01:10:41,380
that you can store
up to about 15%

1274
01:10:41,380 --> 01:10:44,930
of the number of neurons worth
of memories in your network.

1275
01:10:44,930 --> 01:10:48,500
So if I have 100
neurons in my network,

1276
01:10:48,500 --> 01:10:53,240
I should be able to store
about 15 different states

1277
01:10:53,240 --> 01:10:54,830
in that network
before they start

1278
01:10:54,830 --> 01:10:59,210
to interfere with each other,
before you have a sufficiently

1279
01:10:59,210 --> 01:11:02,360
high probability that
two of those memories

1280
01:11:02,360 --> 01:11:04,410
are next to each other.

1281
01:11:04,410 --> 01:11:07,160
And as soon as that
happens, then you

1282
01:11:07,160 --> 01:11:10,430
start getting crosstalk
between those memories that

1283
01:11:10,430 --> 01:11:12,800
causes the state of
the system to evolve

1284
01:11:12,800 --> 01:11:19,580
in a way that doesn't recall
one of your stored memories,

1285
01:11:19,580 --> 01:11:20,680
all right?

1286
01:11:20,680 --> 01:11:24,570
And what that looks like
in the energy landscape

1287
01:11:24,570 --> 01:11:31,260
is when you build a network
with, let's say, five memories,

1288
01:11:31,260 --> 01:11:36,220
there will be five minima in
the network that sort of have

1289
01:11:36,220 --> 01:11:41,220
equal low values of energy.

1290
01:11:41,220 --> 01:11:44,570
But when you start sticking too
many memories in your network,

1291
01:11:44,570 --> 01:11:47,150
you end up with what are called
spurious attractors, sort

1292
01:11:47,150 --> 01:11:54,242
of local minima
that aren't at the--

1293
01:11:54,242 --> 01:11:56,930
that don't correspond to
one of the stored memories.

1294
01:11:56,930 --> 01:12:01,280
And so as the system evolves,
it can be going downhill

1295
01:12:01,280 --> 01:12:03,590
and get stuck in one
of those minima that

1296
01:12:03,590 --> 01:12:09,020
look like a combination of
two of the stored memories.

1297
01:12:09,020 --> 01:12:11,568
And that's what went wrong here
with the guy with the penguin

1298
01:12:11,568 --> 01:12:12,610
sticking out of his head.

1299
01:12:18,360 --> 01:12:18,940
Who knows?

1300
01:12:18,940 --> 01:12:21,107
Maybe that's what happens
when you look at something

1301
01:12:21,107 --> 01:12:24,273
and you're confused
about what youre seeing.

1302
01:12:24,273 --> 01:12:26,190
We don't know if that's
actually what happens,

1303
01:12:26,190 --> 01:12:29,220
but it would be an
interesting thing to test.

1304
01:12:31,770 --> 01:12:33,050
Any questions?

1305
01:12:33,050 --> 01:12:35,780
All right, so that's--

1306
01:12:35,780 --> 01:12:38,820
so you can see that these
are long-term memories.

1307
01:12:38,820 --> 01:12:42,410
These don't depend on activity
in the network to store, right?

1308
01:12:42,410 --> 01:12:45,770
Those are programmed into
the synaptic connections

1309
01:12:45,770 --> 01:12:47,280
between the neurons.

1310
01:12:47,280 --> 01:12:50,040
So you can shut off
all the activity.

1311
01:12:50,040 --> 01:12:54,740
And if you just put in up
a pattern of input that

1312
01:12:54,740 --> 01:12:56,540
reminds you of
something, the network

1313
01:12:56,540 --> 01:13:00,310
will recover the
full memory for you.