1 00:00:14,250 --> 00:00:16,560 MICHALE FEE: Today, we're going to finish up 2 00:00:16,560 --> 00:00:19,240 with recurrent neural networks. 3 00:00:19,240 --> 00:00:23,610 So as you remember, we've been talking about the case 4 00:00:23,610 --> 00:00:26,670 where we have a layer of neurons in which we have 5 00:00:26,670 --> 00:00:30,480 recurrent connections between neurons in the output 6 00:00:30,480 --> 00:00:32,700 layer of our network. 7 00:00:32,700 --> 00:00:36,750 And we've been developing the mathematical tools 8 00:00:36,750 --> 00:00:40,370 to describe the behavior of these networks 9 00:00:40,370 --> 00:00:43,593 and describe how they respond to their inputs. 10 00:00:43,593 --> 00:00:46,260 And we've been talking about the different kinds of computations 11 00:00:46,260 --> 00:00:50,140 that recurrent neural networks can perform. 12 00:00:50,140 --> 00:00:54,030 So you may recall that we started talking about-- 13 00:00:54,030 --> 00:00:57,810 we introduced the math or the concept 14 00:00:57,810 --> 00:00:59,580 of how to study recurrent neural networks 15 00:00:59,580 --> 00:01:01,900 by looking at the simplest recurrent 16 00:01:01,900 --> 00:01:03,838 network that has a single-- 17 00:01:03,838 --> 00:01:06,390 it's a single neuron with a recurrent connection called 18 00:01:06,390 --> 00:01:08,530 an autapse. 19 00:01:08,530 --> 00:01:11,020 A recurrent connection has a strength lambda. 20 00:01:11,020 --> 00:01:12,940 And we can write down-- 21 00:01:12,940 --> 00:01:13,440 let's see. 22 00:01:13,440 --> 00:01:15,510 So we can write down the equation 23 00:01:15,510 --> 00:01:17,700 for this, the response of this neuron, 24 00:01:17,700 --> 00:01:21,720 without a recurrent connection as tau dv dt equals 25 00:01:21,720 --> 00:01:26,000 minus v. The minus v is essentially a leak term, 26 00:01:26,000 --> 00:01:27,810 so that if you put input into the neuron, 27 00:01:27,810 --> 00:01:31,920 the response of the neuron jumps up and then decays 28 00:01:31,920 --> 00:01:35,460 exponentially in response to an input, h. 29 00:01:35,460 --> 00:01:38,430 If we have a recurrent connection lambda, 30 00:01:38,430 --> 00:01:41,970 then there's an additional input to the neuron 31 00:01:41,970 --> 00:01:45,780 that's proportional to the firing rate of the neuron. 32 00:01:45,780 --> 00:01:48,825 We can rewrite that equation now as tau dv 33 00:01:48,825 --> 00:01:53,580 dt equals minus quantity one minus lambda times 34 00:01:53,580 --> 00:01:55,350 v plus the input. 35 00:01:55,350 --> 00:01:59,760 And the behavior of this simple recurrent neural network 36 00:01:59,760 --> 00:02:05,400 depends strongly on the value of this coefficient one 37 00:02:05,400 --> 00:02:06,660 minus lambda. 38 00:02:06,660 --> 00:02:08,580 And we've talked about three different cases. 39 00:02:08,580 --> 00:02:11,410 We've talked about case where lambda is less than one, 40 00:02:11,410 --> 00:02:15,010 where lambda is equal to one-- 41 00:02:15,010 --> 00:02:17,250 in which case, this coefficient is zero-- 42 00:02:17,250 --> 00:02:19,150 and when lambda is greater than one. 43 00:02:19,150 --> 00:02:23,050 So let's look at those three cases again for this equation. 44 00:02:23,050 --> 00:02:24,870 So when lambda is less than one, you 45 00:02:24,870 --> 00:02:27,960 can see that this quantity right here, this coefficient 46 00:02:27,960 --> 00:02:30,340 in front of the v is negative. 47 00:02:30,340 --> 00:02:34,830 And what that means is that the firing rate of this neuron 48 00:02:34,830 --> 00:02:40,350 relaxes exponentially toward some h infinity-- 49 00:02:40,350 --> 00:02:42,150 sorry, some v infinity. 50 00:02:42,150 --> 00:02:45,740 And then when the input goes away, the neuron-- 51 00:02:45,740 --> 00:02:50,230 the firing rate decays exponentially towards zero. 52 00:02:50,230 --> 00:02:53,630 OK, so in the case where lambda is equal to one, 53 00:02:53,630 --> 00:02:56,250 you can see that this coefficient is zero. 54 00:02:56,250 --> 00:02:58,670 And now you can see that the derivative 55 00:02:58,670 --> 00:03:02,012 of the firing rate of the neuron is just equal to the input. 56 00:03:02,012 --> 00:03:04,220 What that means is that the firing rate of the neuron 57 00:03:04,220 --> 00:03:06,460 essentially integrates the input. 58 00:03:06,460 --> 00:03:10,040 And you can see, if you put a step input into this neuron 59 00:03:10,040 --> 00:03:14,120 with this recurrent connection of lambda equal one, 60 00:03:14,120 --> 00:03:18,200 that the response of the neuron simply ramps up linearly, 61 00:03:18,200 --> 00:03:20,960 which corresponds to integrating that step input. 62 00:03:20,960 --> 00:03:23,360 And then when the input is turned off 63 00:03:23,360 --> 00:03:25,250 and goes back to zero, you can see 64 00:03:25,250 --> 00:03:27,680 that the firing rate of the neuron stays constant. 65 00:03:27,680 --> 00:03:31,580 And that's because the leak is exactly 66 00:03:31,580 --> 00:03:36,680 balanced by this excitatory recurrent input from the neuron 67 00:03:36,680 --> 00:03:38,850 onto itself. 68 00:03:38,850 --> 00:03:41,970 So you can see that for the case for lambda equals one, 69 00:03:41,970 --> 00:03:45,390 there's persistent activity after you 70 00:03:45,390 --> 00:03:46,830 put an input into this neuron. 71 00:03:46,830 --> 00:03:53,550 And we talked about how this forms a short-term memory that 72 00:03:53,550 --> 00:03:55,700 can be used for a bunch of different things. 73 00:03:55,700 --> 00:03:59,220 It's a short-term memory of a scalar, 74 00:03:59,220 --> 00:04:01,830 or a continuous quantity, like I position. 75 00:04:01,830 --> 00:04:07,770 Or we talked about short-term memory integration being used 76 00:04:07,770 --> 00:04:13,470 for path integration or for accumulating evidence across 77 00:04:13,470 --> 00:04:14,460 noisy-- 78 00:04:14,460 --> 00:04:17,939 over long exposure to a noisy stimulus. 79 00:04:20,490 --> 00:04:24,000 So today, we're going to focus on networks where this lambda 80 00:04:24,000 --> 00:04:25,620 is greater than one. 81 00:04:25,620 --> 00:04:29,100 And in that case, you can see that the differential equation 82 00:04:29,100 --> 00:04:29,790 looks like this. 83 00:04:29,790 --> 00:04:31,910 So if lambda is greater than one, 84 00:04:31,910 --> 00:04:36,270 then the quantity inside the parentheses here is negative. 85 00:04:36,270 --> 00:04:39,400 But that's multiplied by a minus one. 86 00:04:39,400 --> 00:04:42,910 So the coefficient in front of the v is positive. 87 00:04:42,910 --> 00:04:47,860 So if v itself is a positive number, 88 00:04:47,860 --> 00:04:50,950 then dv dt is also positive. 89 00:04:50,950 --> 00:04:54,160 So if v is positive and dv dt is positive, 90 00:04:54,160 --> 00:04:57,120 then what that means is that the firing rate of that neuron 91 00:04:57,120 --> 00:05:02,130 is growing and, in this case, is growing exponentially. 92 00:05:02,130 --> 00:05:05,280 So that when you put an input in, the response of the neuron 93 00:05:05,280 --> 00:05:06,600 grows exponentially. 94 00:05:06,600 --> 00:05:10,980 But when you turn the input off, the firing rate of the neuron 95 00:05:10,980 --> 00:05:17,090 continues to grow exponentially, which is a little bit crazy. 96 00:05:17,090 --> 00:05:19,120 You know that neurons in the brain, of course, 97 00:05:19,120 --> 00:05:20,950 don't have firing rates that just 98 00:05:20,950 --> 00:05:23,790 keep growing exponentially. 99 00:05:23,790 --> 00:05:28,150 So we're going to solve that problem by using nonlinearities 100 00:05:28,150 --> 00:05:31,030 in the firing F-I curve of neurons. 101 00:05:31,030 --> 00:05:34,810 But the key point here is that this kind of network 102 00:05:34,810 --> 00:05:37,840 actually remembers that there was an input, as opposed 103 00:05:37,840 --> 00:05:41,740 to this kind of network, where the when the input goes away, 104 00:05:41,740 --> 00:05:45,490 the activity of the network just decays back to zero. 105 00:05:45,490 --> 00:05:48,130 This kind of network has no memory 106 00:05:48,130 --> 00:05:51,230 that there was an input long ago in the past. 107 00:05:51,230 --> 00:05:53,590 Whereas, this kind of network remembers 108 00:05:53,590 --> 00:05:55,180 that there was an input. 109 00:05:55,180 --> 00:06:00,250 And so that kind of property when lambda is greater than one 110 00:06:00,250 --> 00:06:02,470 is useful for storing memories. 111 00:06:02,470 --> 00:06:06,370 So we're going to expand on that idea. 112 00:06:06,370 --> 00:06:09,820 In particular, we're going to use that theme 113 00:06:09,820 --> 00:06:14,020 to build networks that have attractors, that 114 00:06:14,020 --> 00:06:16,960 have stable states that they can go to, 115 00:06:16,960 --> 00:06:21,610 that depend on prior inputs, but also can be used 116 00:06:21,610 --> 00:06:23,860 to store long-term memories. 117 00:06:23,860 --> 00:06:25,340 All right? 118 00:06:25,340 --> 00:06:27,850 We're going to see how that kind of network 119 00:06:27,850 --> 00:06:31,960 can also be used to produce a winner-take-all network that 120 00:06:31,960 --> 00:06:35,020 is sensitive to which of two inputs 121 00:06:35,020 --> 00:06:39,130 are stronger and stores a memory of preceding inputs 122 00:06:39,130 --> 00:06:41,740 where one input is stronger than the other. 123 00:06:41,740 --> 00:06:47,990 Or it ends up in a different state when, let's say, input 124 00:06:47,990 --> 00:06:49,620 one is stronger than input 2, and it 125 00:06:49,620 --> 00:06:51,370 lands in a different state when input 2 is 126 00:06:51,370 --> 00:06:54,660 stronger than input one. 127 00:06:54,660 --> 00:06:58,200 We're going to then describe a particular model, called 128 00:06:58,200 --> 00:07:01,470 a Hopfield model, for how attractor networks can 129 00:07:01,470 --> 00:07:03,933 store long-term memories. 130 00:07:03,933 --> 00:07:05,850 We're going to introduce the idea of an energy 131 00:07:05,850 --> 00:07:09,510 landscape, which is a property of networks that 132 00:07:09,510 --> 00:07:13,350 have symmetric connections, of which the Hopfield model is 133 00:07:13,350 --> 00:07:14,407 an example. 134 00:07:14,407 --> 00:07:15,990 And then we're going to end by talking 135 00:07:15,990 --> 00:07:21,180 about how many memories such a network can actually store, 136 00:07:21,180 --> 00:07:24,170 known as the capacity problem. 137 00:07:24,170 --> 00:07:27,520 OK, so let's start with recurrent networks 138 00:07:27,520 --> 00:07:28,790 with lambda greater than one. 139 00:07:28,790 --> 00:07:30,460 So let's start with our autapse. 140 00:07:30,460 --> 00:07:33,550 Let's put lambda equal to 2. 141 00:07:33,550 --> 00:07:37,120 And again, you can see that if we rewrite this equation 142 00:07:37,120 --> 00:07:40,270 with lambda greater than one, we can 143 00:07:40,270 --> 00:07:45,760 write tau dv dt equals lambda minus one times v plus h. 144 00:07:45,760 --> 00:07:52,020 You can see that the value of zero, 145 00:07:52,020 --> 00:07:55,255 at the firing rate of zero, is an unstable fixed 146 00:07:55,255 --> 00:07:56,130 point of the network. 147 00:07:56,130 --> 00:07:56,670 Why is that? 148 00:07:56,670 --> 00:08:02,950 Because at v equals zero, then dv dt equals zero. 149 00:08:02,950 --> 00:08:05,860 So what that means is that if the firing rate is exactly 150 00:08:05,860 --> 00:08:09,890 zero, that's a fixed point of the system. 151 00:08:09,890 --> 00:08:14,020 But if v deviates very slightly from zero, 152 00:08:14,020 --> 00:08:16,900 v becomes very slightly positive, 153 00:08:16,900 --> 00:08:21,250 then dv dt is positive, and the firing rate of the neuron 154 00:08:21,250 --> 00:08:22,240 starts running away. 155 00:08:25,000 --> 00:08:29,430 So what you can see is if you start the fire rate at zero 156 00:08:29,430 --> 00:08:34,169 and have the input at zero, then dv dt is zero, 157 00:08:34,169 --> 00:08:37,210 and the network will stay at zero firing rate. 158 00:08:37,210 --> 00:08:42,030 But if you put in a very slight, a very small input, then 159 00:08:42,030 --> 00:08:46,590 dv dt goes positive, and the network activity runs away. 160 00:08:49,560 --> 00:08:53,260 Now, let's put in an input of the opposite sign. 161 00:08:53,260 --> 00:08:55,560 So now let's start with v equals zero 162 00:08:55,560 --> 00:08:59,130 and put in a very tiny negative input. 163 00:08:59,130 --> 00:09:01,410 What's the network going to do? 164 00:09:01,410 --> 00:09:06,720 So tau dv dt equals v. So v is very slightly negative, 165 00:09:06,720 --> 00:09:09,600 or if h is very slightly negative and v is zero, 166 00:09:09,600 --> 00:09:14,130 then dv dt will be negative, and the network will run away 167 00:09:14,130 --> 00:09:15,630 in the negative direction. 168 00:09:15,630 --> 00:09:20,310 So this network actually can produce two memories. 169 00:09:20,310 --> 00:09:26,780 It can produce a memory that a preceding input was positive, 170 00:09:26,780 --> 00:09:31,390 or it can store a memory that a preceding input was negative. 171 00:09:31,390 --> 00:09:35,990 So it has two configurations after you've 172 00:09:35,990 --> 00:09:39,020 put in an input that is positive or negative, right? 173 00:09:39,020 --> 00:09:43,190 It can produce a positive output or a negative output that's 174 00:09:43,190 --> 00:09:44,550 persistent for a long time. 175 00:09:44,550 --> 00:09:45,300 Yes? 176 00:09:45,300 --> 00:09:47,505 AUDIENCE: Is the [INAUDIBLE] of a negative firing 177 00:09:47,505 --> 00:09:50,210 rate [INAUDIBLE]? 178 00:09:50,210 --> 00:09:51,980 MICHALE FEE: Yeah. 179 00:09:51,980 --> 00:09:55,790 So you can basically reformulate everything 180 00:09:55,790 --> 00:09:57,770 that we've been talking about for neurons 181 00:09:57,770 --> 00:10:03,350 that have zero, that can't have negative firing rates. 182 00:10:03,350 --> 00:10:08,173 But in this case, we've been working with linear neurons. 183 00:10:08,173 --> 00:10:10,340 And it seems like the negative fire rates are pretty 184 00:10:10,340 --> 00:10:12,770 non-physical, non-intuitive. 185 00:10:12,770 --> 00:10:18,110 But it's a pretty standard way to do the mathematical analysis 186 00:10:18,110 --> 00:10:20,810 for neurons like this, is to treat them as linear. 187 00:10:20,810 --> 00:10:24,170 But you can sort of reformulate all of these networks 188 00:10:24,170 --> 00:10:26,420 in a way that don't have that non-physical property. 189 00:10:26,420 --> 00:10:32,510 So for now, let's just bear with this slightly uncomfortable 190 00:10:32,510 --> 00:10:35,330 situation of having neurons that have negative firing rates. 191 00:10:35,330 --> 00:10:37,880 Generally, we're going to associate negative firing rates 192 00:10:37,880 --> 00:10:43,280 as inhibition, OK? 193 00:10:43,280 --> 00:10:46,220 But don't worry about that here. 194 00:10:46,220 --> 00:10:48,980 All right, so we're going to solve this problem 195 00:10:48,980 --> 00:10:51,830 that these neurons have firing rates that are kind of running 196 00:10:51,830 --> 00:10:57,710 away exponentially by adding a nonlinear activation function. 197 00:10:57,710 --> 00:11:00,290 So a typical nonlinear activation 198 00:11:00,290 --> 00:11:03,080 function that you might use for linear neurons, 199 00:11:03,080 --> 00:11:06,590 like for networks of the type we've been considering, 200 00:11:06,590 --> 00:11:09,710 is a symmetric F-I curve, where if the input is 201 00:11:09,710 --> 00:11:13,760 positive and small, the firing rate of the neuron 202 00:11:13,760 --> 00:11:18,110 grows linearly, until you reach a point where it saturates. 203 00:11:18,110 --> 00:11:21,530 And larger inputs don't produce any larger firing 204 00:11:21,530 --> 00:11:23,550 rate of the neuron. 205 00:11:23,550 --> 00:11:27,980 So most neurons actually have kind of a saturating F-I curve, 206 00:11:27,980 --> 00:11:31,190 like this, like the Hodgkin-Huxley neurons begin 207 00:11:31,190 --> 00:11:31,970 to saturate. 208 00:11:31,970 --> 00:11:32,570 Why is that? 209 00:11:32,570 --> 00:11:36,410 Because the sodium channels begin to inactivate, 210 00:11:36,410 --> 00:11:42,000 and it can't fire any faster than the-- 211 00:11:42,000 --> 00:11:45,800 there's a time between spikes that's 212 00:11:45,800 --> 00:11:48,260 sort of the closest that the neuron-- the fastest 213 00:11:48,260 --> 00:11:50,210 that the neuron can spike because of sodium 214 00:11:50,210 --> 00:11:52,490 channel inactivation. 215 00:11:52,490 --> 00:11:54,410 And then on the minus side, if the input 216 00:11:54,410 --> 00:11:57,560 is small and negative, then the firing rate of the neuron 217 00:11:57,560 --> 00:11:59,660 goes negative linearly for a while 218 00:11:59,660 --> 00:12:01,880 and then saturates at some value. 219 00:12:01,880 --> 00:12:05,390 And we typically have the neuron saturating between one 220 00:12:05,390 --> 00:12:07,870 and minus one. 221 00:12:07,870 --> 00:12:10,890 So now, if you start your neuron at zero firing rate 222 00:12:10,890 --> 00:12:13,230 and you put in a little positive input, 223 00:12:13,230 --> 00:12:14,480 what's the neuron going to do? 224 00:12:17,110 --> 00:12:19,402 Any guesses? 225 00:12:19,402 --> 00:12:20,310 AUDIENCE: [INAUDIBLE] 226 00:12:20,310 --> 00:12:21,670 MICHALE FEE: Yeah. 227 00:12:21,670 --> 00:12:25,030 It's going to start running up exponentially, 228 00:12:25,030 --> 00:12:29,370 but then it's going to saturate up here. 229 00:12:29,370 --> 00:12:34,810 And so the firing rate will run up and sit at one. 230 00:12:34,810 --> 00:12:38,770 And if we put in a negative input, a small negative input, 231 00:12:38,770 --> 00:12:40,450 then the neuron-- 232 00:12:40,450 --> 00:12:43,520 then this little recurrent network 233 00:12:43,520 --> 00:12:48,490 will go negative and saturate at minus one, OK? 234 00:12:48,490 --> 00:12:51,280 So you can see that this network actually 235 00:12:51,280 --> 00:12:56,880 has one unstable fixed point, where if it sits exactly 236 00:12:56,880 --> 00:13:00,630 at zero, it will stay at zero, until you 237 00:13:00,630 --> 00:13:04,560 give a little bit of input in either direction. 238 00:13:04,560 --> 00:13:08,670 And then the network will run up and sit at another fixed point 239 00:13:08,670 --> 00:13:10,350 here of one. 240 00:13:10,350 --> 00:13:12,000 If you put in a big negative input, 241 00:13:12,000 --> 00:13:14,040 you can drive it to another fixed point. 242 00:13:14,040 --> 00:13:16,470 And these two are stable fixed points, 243 00:13:16,470 --> 00:13:19,500 because once they're in that state, 244 00:13:19,500 --> 00:13:22,530 if you give little perturbations to the network, 245 00:13:22,530 --> 00:13:26,310 it will deviate a little bit from that value. 246 00:13:26,310 --> 00:13:29,130 If you give a small negative input, 247 00:13:29,130 --> 00:13:31,470 you can cause this to decrease a little bit. 248 00:13:31,470 --> 00:13:34,050 But then when the input goes away, it will relax back. 249 00:13:34,050 --> 00:13:36,510 So this is an unstable fixed point, 250 00:13:36,510 --> 00:13:40,180 and these are two stable fixed points. 251 00:13:40,180 --> 00:13:43,510 Now, we're going to come back to this in more detail later. 252 00:13:43,510 --> 00:13:46,030 But we often think about networks 253 00:13:46,030 --> 00:13:55,170 like this as sort of like a ball on a hill. 254 00:13:55,170 --> 00:14:00,570 So you can imagine that you can describe this network using 255 00:14:00,570 --> 00:14:02,550 what's called an energy landscape, where 256 00:14:02,550 --> 00:14:07,200 if you start this system at some point on this sort 257 00:14:07,200 --> 00:14:13,500 of valley-shaped hill, all right, 258 00:14:13,500 --> 00:14:16,980 the network sort of-- it's like a ball that rolls downhill. 259 00:14:16,980 --> 00:14:20,490 So if you start the network exactly at the peak, 260 00:14:20,490 --> 00:14:22,110 the ball will sit there. 261 00:14:22,110 --> 00:14:24,810 But if you give it a little bit of a nudge, 262 00:14:24,810 --> 00:14:30,300 it will roll downhill toward one of these stable points, OK? 263 00:14:30,300 --> 00:14:32,640 If you start it slightly on the other side, 264 00:14:32,640 --> 00:14:35,370 it will roll this way, OK? 265 00:14:35,370 --> 00:14:38,520 And those stable fixed points are called attractors. 266 00:14:38,520 --> 00:14:41,550 And this particular network has two tractors-- 267 00:14:41,550 --> 00:14:44,160 one with a firing rate of one and one 268 00:14:44,160 --> 00:14:45,580 at a firing rate of minus one. 269 00:14:45,580 --> 00:14:46,403 Yes, Appolonia? 270 00:14:46,403 --> 00:14:47,778 AUDIENCE: The stable fixed points 271 00:14:47,778 --> 00:14:51,345 of the top graph, where'd you say they were? 272 00:14:51,345 --> 00:14:52,830 MICHALE FEE: The stable fixed point 273 00:14:52,830 --> 00:14:54,780 is here, because once you-- 274 00:14:54,780 --> 00:14:56,730 if the system is in this state, you 275 00:14:56,730 --> 00:14:59,040 can give slight perturbations and the system 276 00:14:59,040 --> 00:15:01,770 returns to that fixed point. 277 00:15:01,770 --> 00:15:03,960 This is an unstable fixed point, because if you 278 00:15:03,960 --> 00:15:06,090 start the system there and give it 279 00:15:06,090 --> 00:15:09,555 a little nudge in either direction, the state runs away. 280 00:15:09,555 --> 00:15:11,020 Does that makes sense? 281 00:15:11,020 --> 00:15:11,737 AUDIENCE: Yeah. 282 00:15:11,737 --> 00:15:13,320 MICHALE FEE: Any questions about that? 283 00:15:13,320 --> 00:15:14,070 Yes? 284 00:15:14,070 --> 00:15:17,861 AUDIENCE: How is the shape of the curve [INAUDIBLE] points 285 00:15:17,861 --> 00:15:20,733 determined based on like-- 286 00:15:20,733 --> 00:15:22,150 MICHALE FEE: So I'm going to get-- 287 00:15:22,150 --> 00:15:23,900 I'm going to come back to how you actually 288 00:15:23,900 --> 00:15:27,730 calculate this energy landscape more formally. 289 00:15:27,730 --> 00:15:34,870 There's a very precise mathematical definition 290 00:15:34,870 --> 00:15:38,560 of how you define this energy landscape. 291 00:15:38,560 --> 00:15:42,530 All right, so this was all for the case of one neuron, 292 00:15:42,530 --> 00:15:43,030 all right? 293 00:15:43,030 --> 00:15:47,170 So now let's extend it to the case of multiple neurons. 294 00:15:47,170 --> 00:15:52,150 So let's just take two neurons with an autapse. 295 00:15:52,150 --> 00:15:58,990 One of these autapses have a value strength of two, 296 00:15:58,990 --> 00:16:01,540 and the other autapse have a strength of minus two. 297 00:16:01,540 --> 00:16:04,090 So this one is recurrent and excitatory. 298 00:16:04,090 --> 00:16:06,365 This one is recurrent and inhibitory. 299 00:16:06,365 --> 00:16:07,990 So now what we're going to do is we can 300 00:16:07,990 --> 00:16:10,540 plot the state of the network. 301 00:16:10,540 --> 00:16:14,890 Now, instead of being the state of the network 302 00:16:14,890 --> 00:16:19,180 in one dimension, v, we're now going to have v1 and v2. 303 00:16:19,180 --> 00:16:21,640 So the state of the system is going 304 00:16:21,640 --> 00:16:26,140 to be a point in a plane given by v1 and v2. 305 00:16:29,440 --> 00:16:33,950 So now, by looking at this network, 306 00:16:33,950 --> 00:16:38,090 you can see immediately that this particular neuron, 307 00:16:38,090 --> 00:16:40,810 this neuron with a firing rate of v2, 308 00:16:40,810 --> 00:16:44,440 looks like the kind of network that we've already 309 00:16:44,440 --> 00:16:45,130 studied, right? 310 00:16:45,130 --> 00:16:50,170 It has a stable fixed point at zero. 311 00:16:50,170 --> 00:16:54,790 And this network has two stable fixed points-- 312 00:16:54,790 --> 00:16:58,630 one at one and the other one at minus one. 313 00:16:58,630 --> 00:17:00,670 So you can see that this system will also 314 00:17:00,670 --> 00:17:04,099 have two stable fixed points-- 315 00:17:04,099 --> 00:17:05,810 one there and one there, right? 316 00:17:05,810 --> 00:17:08,630 Because if I take the input away, 317 00:17:08,630 --> 00:17:12,500 this neuron is either going to one or minus one, 318 00:17:12,500 --> 00:17:14,690 and this neuron is going to go to zero. 319 00:17:14,690 --> 00:17:17,819 So there's one and minus one on the v1 axis. 320 00:17:17,819 --> 00:17:21,950 And those two states have zero firing rate on the v2 axis. 321 00:17:21,950 --> 00:17:24,430 Is that clear? 322 00:17:24,430 --> 00:17:30,400 So now what's going to happen if we made this autapse have 323 00:17:30,400 --> 00:17:32,860 a strength of two? 324 00:17:32,860 --> 00:17:35,300 Anybody want to take a guess? 325 00:17:35,300 --> 00:17:38,048 AUDIENCE: That's, like, four attractors? 326 00:17:38,048 --> 00:17:38,840 MICHALE FEE: Right. 327 00:17:38,840 --> 00:17:40,871 Why is that? 328 00:17:40,871 --> 00:17:46,424 AUDIENCE: Because that will also have stable fixed points 329 00:17:46,424 --> 00:17:48,950 at [INAUDIBLE]. 330 00:17:48,950 --> 00:17:50,300 MICHALE FEE: Right. 331 00:17:50,300 --> 00:17:53,090 So this one will have stable fixed points 332 00:17:53,090 --> 00:17:54,560 at one and minus one. 333 00:17:54,560 --> 00:17:56,660 This will also have stable fixed points at one 334 00:17:56,660 --> 00:17:58,460 and minus one, right? 335 00:17:58,460 --> 00:18:01,560 And the system can be in any one of four states-- 336 00:18:01,560 --> 00:18:02,600 0, 0. 337 00:18:02,600 --> 00:18:08,810 Sorry, 1, 1; minus 1, minus 1; 1 minus 1; and minus 1, 1. 338 00:18:08,810 --> 00:18:10,500 That's right. 339 00:18:10,500 --> 00:18:14,140 All right, so I just want to make one other point here, 340 00:18:14,140 --> 00:18:16,350 which is that no matter where you start 341 00:18:16,350 --> 00:18:19,820 the system for this network, it's 342 00:18:19,820 --> 00:18:23,720 going to evolve towards one of these stable fixed points, 343 00:18:23,720 --> 00:18:26,550 unless I started it exactly right there at zero. 344 00:18:26,550 --> 00:18:28,310 That will be another fixed point, 345 00:18:28,310 --> 00:18:30,380 but that's an unstable fixed point. 346 00:18:30,380 --> 00:18:33,890 OK, so this system will-- 347 00:18:33,890 --> 00:18:36,650 no matter where I start the state 348 00:18:36,650 --> 00:18:39,720 of that system, other than that exact point right there, 349 00:18:39,720 --> 00:18:43,208 the network will evolve toward one of those two attractors. 350 00:18:43,208 --> 00:18:44,750 That's why they're called attractors, 351 00:18:44,750 --> 00:18:49,070 because they attract the state of the system toward one 352 00:18:49,070 --> 00:18:50,810 of those two points. 353 00:18:50,810 --> 00:18:51,641 Yes? 354 00:18:51,641 --> 00:18:56,351 AUDIENCE: So are the attractors determined by the nonlinear 355 00:18:56,351 --> 00:18:57,263 activation function? 356 00:18:57,263 --> 00:18:58,180 MICHALE FEE: They are. 357 00:18:58,180 --> 00:19:03,840 So if this non-linear activation function saturated at two 358 00:19:03,840 --> 00:19:08,220 and minus two, then these two points would be up here at two 359 00:19:08,220 --> 00:19:08,910 and minus two. 360 00:19:13,990 --> 00:19:18,580 So you could see that this network has two eigenvalues, 361 00:19:18,580 --> 00:19:19,180 right? 362 00:19:19,180 --> 00:19:21,620 If we think of it as a linear network, 363 00:19:21,620 --> 00:19:23,440 this network has two eigenvalues. 364 00:19:23,440 --> 00:19:29,020 The connection matrix is given by a diagonal matrix with a two 365 00:19:29,020 --> 00:19:31,960 and a minus two along the diagonals, right? 366 00:19:31,960 --> 00:19:33,870 So let's take a look at this kind of network. 367 00:19:33,870 --> 00:19:37,300 Now, instead of an autapse network, 368 00:19:37,300 --> 00:19:41,360 we have recurrent connections of strength minus 2 and minus 2. 369 00:19:41,360 --> 00:19:46,045 So what does that weight matrix look like? 370 00:19:46,045 --> 00:19:48,340 AUDIENCE: 0, minus 2; minus 2, 0. 371 00:19:48,340 --> 00:19:51,990 MICHALE FEE: 0, minus 2; minus 2, 0, right? 372 00:19:51,990 --> 00:19:57,700 Well, what are the eigenvalues of this network? 373 00:19:57,700 --> 00:20:00,454 Anybody remember that? 374 00:20:00,454 --> 00:20:01,408 AUDIENCE: [INAUDIBLE] 375 00:20:01,408 --> 00:20:02,200 MICHALE FEE: Right. 376 00:20:02,200 --> 00:20:05,410 It's a plus b and a minus b. 377 00:20:05,410 --> 00:20:08,620 And so the eigenvalues of this network 378 00:20:08,620 --> 00:20:13,540 are 0 plus negative 2 and 0 minus negative 2. 379 00:20:13,540 --> 00:20:16,900 So it's 2 and minus 2, right? 380 00:20:16,900 --> 00:20:20,950 So this network here will have exactly the same eigenvalues 381 00:20:20,950 --> 00:20:22,220 as this network. 382 00:20:22,220 --> 00:20:24,490 But what's going to be different? 383 00:20:24,490 --> 00:20:27,165 What are the eigenvectors? 384 00:20:27,165 --> 00:20:28,553 AUDIENCE: The 45. 385 00:20:28,553 --> 00:20:29,720 MICHALE FEE: The 45 degrees. 386 00:20:29,720 --> 00:20:33,940 So the eigenvectors of this network are the x- and y-axes. 387 00:20:33,940 --> 00:20:37,160 The eigenvectors of this network are the 45-degree lines. 388 00:20:37,160 --> 00:20:39,670 So anybody want to take a guess as to what 389 00:20:39,670 --> 00:20:41,200 the stable states of this-- 390 00:20:45,080 --> 00:20:48,840 it's just this network rotated by 45 degrees, right? 391 00:20:51,930 --> 00:20:55,860 So those are now the attractors of this network, right? 392 00:20:55,860 --> 00:20:57,750 And that makes sense, right? 393 00:20:57,750 --> 00:21:00,480 This neuron can be positive, but that's 394 00:21:00,480 --> 00:21:05,460 going to be strongly driving this neuron negative. 395 00:21:05,460 --> 00:21:07,500 But if this neuron is negative, that's 396 00:21:07,500 --> 00:21:11,370 going to be strongly driving this neuron positive, right? 397 00:21:11,370 --> 00:21:16,080 And so this network will want to sit out here 398 00:21:16,080 --> 00:21:20,280 on this line in this direction or in this direction. 399 00:21:20,280 --> 00:21:22,520 And because of the saturation-- 400 00:21:22,520 --> 00:21:25,880 if there were no saturation, if this were a linear network, 401 00:21:25,880 --> 00:21:27,500 the activity of this neuron would just 402 00:21:27,500 --> 00:21:32,660 be running exponentially up these 45-degree lines. 403 00:21:32,660 --> 00:21:34,310 But because of the saturation, it 404 00:21:34,310 --> 00:21:36,745 gets stuck here at 1, minus 1. 405 00:21:36,745 --> 00:21:40,880 Or rather, minus 1, 1 or 1, minus 1. 406 00:21:40,880 --> 00:21:42,967 Any questions about that? 407 00:21:42,967 --> 00:21:43,550 Yeah, Jasmine? 408 00:21:46,412 --> 00:21:48,630 AUDIENCE: So the two fixed points right now, 409 00:21:48,630 --> 00:21:52,590 like it's [INAUDIBLE]? 410 00:21:52,590 --> 00:21:54,000 MICHALE FEE: Yeah. 411 00:21:54,000 --> 00:21:58,912 It'll be one in this direction and one in that direction. 412 00:21:58,912 --> 00:22:02,053 AUDIENCE: So why [INAUDIBLE]? 413 00:22:02,053 --> 00:22:03,970 MICHALE FEE: Because this neuron is saturated. 414 00:22:03,970 --> 00:22:07,090 Because the saturation is acting at the level of the individual 415 00:22:07,090 --> 00:22:07,660 neurons. 416 00:22:07,660 --> 00:22:09,797 AUDIENCE: OK. 417 00:22:09,797 --> 00:22:11,380 MICHALE FEE: So each neuron will go up 418 00:22:11,380 --> 00:22:14,604 to its own saturation point. 419 00:22:14,604 --> 00:22:17,400 OK? 420 00:22:17,400 --> 00:22:17,900 All right. 421 00:22:21,670 --> 00:22:24,400 So this kind of network is actually pretty cool. 422 00:22:24,400 --> 00:22:26,980 This network can implement decision-making. 423 00:22:26,980 --> 00:22:30,280 It can decide, for example, whether one input is bigger 424 00:22:30,280 --> 00:22:32,660 than the other, all right? 425 00:22:32,660 --> 00:22:34,310 So if we have an input-- 426 00:22:34,310 --> 00:22:35,890 so let's start our network right here 427 00:22:35,890 --> 00:22:38,200 at this unstable fixed point, all right? 428 00:22:38,200 --> 00:22:40,810 We've carefully balanced the ball on top of the hill, 429 00:22:40,810 --> 00:22:42,240 and it just sits there. 430 00:22:42,240 --> 00:22:45,610 And now let's put an input that is in this direction h, 431 00:22:45,610 --> 00:22:49,710 so that it's slightly pointing to the right 432 00:22:49,710 --> 00:22:51,300 of this diagonal line. 433 00:22:51,300 --> 00:22:52,570 So what's going to happen? 434 00:22:52,570 --> 00:22:54,540 It's going to kick the state of the network 435 00:22:54,540 --> 00:22:57,300 up in this direction, right? 436 00:22:57,300 --> 00:23:02,670 But we've already discussed how if the network state is 437 00:23:02,670 --> 00:23:06,120 anywhere on either side of that line, 438 00:23:06,120 --> 00:23:09,000 it will evolve toward the fixed point. 439 00:23:09,000 --> 00:23:12,970 If the h is on the other side, it 440 00:23:12,970 --> 00:23:15,970 will kick the network unstable fixed point 441 00:23:15,970 --> 00:23:19,300 into this part of the state space. 442 00:23:19,300 --> 00:23:23,470 And then the network will evolve toward this fixed point, OK? 443 00:23:23,470 --> 00:23:28,390 These half planes here, this region here, 444 00:23:28,390 --> 00:23:32,930 is called the attractor basin for this attractor. 445 00:23:32,930 --> 00:23:36,160 And on this side, it's called attractor basin 446 00:23:36,160 --> 00:23:38,950 for that attractor, OK? 447 00:23:38,950 --> 00:23:41,140 And you can see that this network will 448 00:23:41,140 --> 00:23:46,660 be very sensitive to whichever input, h1 or h2, 449 00:23:46,660 --> 00:23:47,635 is slightly larger. 450 00:23:50,583 --> 00:23:52,000 So let me show you what that looks 451 00:23:52,000 --> 00:23:54,820 like in this little movie. 452 00:23:54,820 --> 00:24:00,450 So we're going to start with our network 453 00:24:00,450 --> 00:24:02,330 exactly at the zero point. 454 00:24:02,330 --> 00:24:04,490 And we're going to give an input in this direction. 455 00:24:04,490 --> 00:24:07,500 And you can see that we've kicked the network slightly 456 00:24:07,500 --> 00:24:08,520 this way. 457 00:24:08,520 --> 00:24:11,160 And now the network evolves toward the fixed point, 458 00:24:11,160 --> 00:24:12,300 and it stays there. 459 00:24:12,300 --> 00:24:13,860 Now if we give a big input this way, 460 00:24:13,860 --> 00:24:16,530 we can push network over, push it 461 00:24:16,530 --> 00:24:21,000 to the other side of this dividing line between the two 462 00:24:21,000 --> 00:24:23,190 basins of attraction, and now the network 463 00:24:23,190 --> 00:24:25,620 sits here at this fixed point. 464 00:24:25,620 --> 00:24:28,800 We can kick it again with another input and push it back. 465 00:24:28,800 --> 00:24:31,320 So it's kind of like a flip-flop, right? 466 00:24:31,320 --> 00:24:33,370 It's pretty cool. 467 00:24:33,370 --> 00:24:37,740 It detects which input was larger, 468 00:24:37,740 --> 00:24:41,640 pushes the network into an attractor that then remembers 469 00:24:41,640 --> 00:24:43,710 which input was larger for, basically, 470 00:24:43,710 --> 00:24:45,660 as long as the network-- 471 00:24:45,660 --> 00:24:47,670 as long as you allow the network to sit there. 472 00:24:47,670 --> 00:24:48,170 OK? 473 00:24:50,540 --> 00:24:52,040 All right, any questions about that? 474 00:24:55,740 --> 00:24:56,627 Yes, Rebecca? 475 00:24:56,627 --> 00:24:57,294 AUDIENCE: Sorry. 476 00:24:57,294 --> 00:24:59,787 So the basin is just like each side of that [INAUDIBLE]?? 477 00:24:59,787 --> 00:25:00,870 MICHALE FEE: That's right. 478 00:25:00,870 --> 00:25:03,670 That's the basin of attraction for this attractor. 479 00:25:03,670 --> 00:25:08,340 If you start the network anywhere in this half plane, 480 00:25:08,340 --> 00:25:13,170 the network will evolve toward that attractor. 481 00:25:13,170 --> 00:25:17,670 And you can use that as a winner-take-all decision-making 482 00:25:17,670 --> 00:25:21,180 network by starting the network right there at zero. 483 00:25:21,180 --> 00:25:24,510 And small kicks in either direction 484 00:25:24,510 --> 00:25:29,100 will cause the network to relax into one of these attractors 485 00:25:29,100 --> 00:25:30,450 and maintain that memory. 486 00:25:34,720 --> 00:25:40,660 Now let's talk about sort of a formal implementation 487 00:25:40,660 --> 00:25:45,020 of a system for producing memories, long-term memories, 488 00:25:45,020 --> 00:25:45,520 all right? 489 00:25:45,520 --> 00:25:47,530 And that's called a Hopfield model. 490 00:25:47,530 --> 00:25:51,820 And the Hopfield model is actually 491 00:25:51,820 --> 00:25:56,260 one of the best current models for understanding 492 00:25:56,260 --> 00:25:59,950 how memory systems like the hippocampus work. 493 00:25:59,950 --> 00:26:03,640 So the basic idea is that we have 494 00:26:03,640 --> 00:26:07,420 neurons in the hippocampus, in particular in the CA3 region 495 00:26:07,420 --> 00:26:11,200 of the hippocampus, that have very prominent-- a lot 496 00:26:11,200 --> 00:26:14,740 of recurrent connectivity between those neurons, 497 00:26:14,740 --> 00:26:15,290 all right? 498 00:26:15,290 --> 00:26:19,070 And so you have input from entorhinal cortex 499 00:26:19,070 --> 00:26:22,130 and from the dentate gyrus that sort of serve 500 00:26:22,130 --> 00:26:27,320 as the stimuli that come into that network and form-- 501 00:26:27,320 --> 00:26:30,950 and burn memories into that part of the network 502 00:26:30,950 --> 00:26:34,630 by changing the synaptic weights within that network. 503 00:26:34,630 --> 00:26:36,500 [INAUDIBLE] that some time later, 504 00:26:36,500 --> 00:26:39,260 when similar inputs come in, they 505 00:26:39,260 --> 00:26:43,040 can reactivate the memory in the hippocampus. 506 00:26:43,040 --> 00:26:48,980 And you recognize and remember that pattern of stimuli. 507 00:26:48,980 --> 00:26:50,810 All right, so we're going to-- 508 00:26:50,810 --> 00:26:55,790 actually, so an example of how this looks when you record 509 00:26:55,790 --> 00:26:58,590 neurons in the hippocampus, it looks like this. 510 00:26:58,590 --> 00:27:04,400 So here's a mouse or a rat with electrodes in its hippocampus. 511 00:27:04,400 --> 00:27:07,190 If you put it in a little arena like this, 512 00:27:07,190 --> 00:27:10,190 it will run around and explore for a while. 513 00:27:10,190 --> 00:27:14,840 You can record where the rat is in that arena [AUDIO OUT] 514 00:27:14,840 --> 00:27:16,730 from neurons. 515 00:27:16,730 --> 00:27:21,740 And measure when the neurons spike and look 516 00:27:21,740 --> 00:27:23,990 at how the firing rate of those neurons 517 00:27:23,990 --> 00:27:25,920 relates to the position of the animal. 518 00:27:25,920 --> 00:27:30,710 So the black trace here shows all of the locations where 519 00:27:30,710 --> 00:27:33,350 the rat was when it was running around the maze, 520 00:27:33,350 --> 00:27:37,250 and the red dot shows where one of these neurons in CA3 521 00:27:37,250 --> 00:27:42,890 of the hippocampus generated a spike, where the rat was when 522 00:27:42,890 --> 00:27:44,420 that neuron generates a spike. 523 00:27:44,420 --> 00:27:46,960 And those are shown with red dots here. 524 00:27:46,960 --> 00:27:51,680 And you can see that this neuron generates 525 00:27:51,680 --> 00:27:56,900 spiking when the animal is in a particular restricted region 526 00:27:56,900 --> 00:28:00,410 of the cage, of its environment. 527 00:28:00,410 --> 00:28:05,040 And different neurons show different localized regions. 528 00:28:05,040 --> 00:28:07,130 So these regions are called place fields, 529 00:28:07,130 --> 00:28:10,430 because they are the places in the environment where 530 00:28:10,430 --> 00:28:12,410 that neurons spikes. 531 00:28:12,410 --> 00:28:14,740 Different neurons have different place fields. 532 00:28:14,740 --> 00:28:17,480 You can actually record from many of these neurons-- 533 00:28:17,480 --> 00:28:21,380 and looking at the pattern of neurons that are spiking, 534 00:28:21,380 --> 00:28:25,370 you can actually figure out where the rat was or is 535 00:28:25,370 --> 00:28:27,350 at any given moment, just by looking at which 536 00:28:27,350 --> 00:28:28,632 of these neurons is spiking. 537 00:28:28,632 --> 00:28:29,840 That's pretty obvious, right? 538 00:28:29,840 --> 00:28:33,650 If this neuron is spiking and this neuron isn't, all 539 00:28:33,650 --> 00:28:35,800 these other neurons, then the animal 540 00:28:35,800 --> 00:28:38,540 is going to be-- you know that the animal is somewhere 541 00:28:38,540 --> 00:28:40,430 in that location right there. 542 00:28:43,420 --> 00:28:47,140 All right, so in a sense, the activity of these neurons 543 00:28:47,140 --> 00:28:51,850 reflects the animal remembering, or sort of remembering, 544 00:28:51,850 --> 00:28:54,340 that it's in a particular location. 545 00:28:54,340 --> 00:28:55,450 It's in a cage. 546 00:28:55,450 --> 00:28:57,670 It looks at the walls of the environment. 547 00:28:57,670 --> 00:28:59,250 It sees a little-- 548 00:28:59,250 --> 00:29:01,780 they use colored cards on the wall 549 00:29:01,780 --> 00:29:03,850 to give the animal cues as to where it is. 550 00:29:03,850 --> 00:29:07,158 So they look around and they say, oh, yeah, I'm here. 551 00:29:07,158 --> 00:29:09,700 In my environment, there's a red card there and a yellow card 552 00:29:09,700 --> 00:29:13,690 there, and that's where I am right now. 553 00:29:13,690 --> 00:29:16,820 So that's the way you think about these hippocampal place 554 00:29:16,820 --> 00:29:18,730 fields as being like a memory. 555 00:29:18,730 --> 00:29:23,290 On top of that, this part of the hippocampus 556 00:29:23,290 --> 00:29:26,950 is necessary for the actual formation of memories 557 00:29:26,950 --> 00:29:31,720 in a broader sense-- not just spatial locations, 558 00:29:31,720 --> 00:29:36,520 but more generally in terms of life events, right? 559 00:29:36,520 --> 00:29:40,690 For humans, the hippocampus is an essential part 560 00:29:40,690 --> 00:29:42,130 of the brain for storing memories. 561 00:29:45,340 --> 00:29:48,310 All right, so let's come back to this idea 562 00:29:48,310 --> 00:29:49,822 of our recurrent network. 563 00:29:49,822 --> 00:29:51,280 And what we're going to do is we're 564 00:29:51,280 --> 00:29:53,072 going to start adding more and more neurons 565 00:29:53,072 --> 00:29:55,785 to our recurrent network. 566 00:29:55,785 --> 00:29:57,410 All right, so here's what the attractor 567 00:29:57,410 --> 00:29:58,785 looked like for the case where we 568 00:29:58,785 --> 00:30:02,180 have one eigenvalue in the system that's greater than one, 569 00:30:02,180 --> 00:30:05,070 another one that's less than one. 570 00:30:05,070 --> 00:30:08,070 If we now make both of these neurons 571 00:30:08,070 --> 00:30:11,940 have recurrent connections that are stronger than one, 572 00:30:11,940 --> 00:30:14,760 now we're going to have four attractors, right? 573 00:30:14,760 --> 00:30:18,430 Each one of these has two stable fixed points-- 574 00:30:18,430 --> 00:30:20,790 a one and minus one. 575 00:30:20,790 --> 00:30:25,450 So here, for these two states, v1 is one. 576 00:30:25,450 --> 00:30:28,530 And for these two states, v1 is negative 1. 577 00:30:28,530 --> 00:30:31,260 For these two states, v2 is 1, and these two states, 578 00:30:31,260 --> 00:30:33,540 v2 is negative one, all right? 579 00:30:33,540 --> 00:30:38,460 So you can see every time we add another neuron 580 00:30:38,460 --> 00:30:41,880 or another neuron to our network that has an autapse, 581 00:30:41,880 --> 00:30:46,500 every time we add another neuron with another eigenvalue, 582 00:30:46,500 --> 00:30:51,740 we add more possible states of the network, OK? 583 00:30:51,740 --> 00:30:55,190 So if we had two neurons, we have one neuron 584 00:30:55,190 --> 00:30:58,730 with an eigenvalue with an autapse greater than one, 585 00:30:58,730 --> 00:30:59,660 we have two states. 586 00:30:59,660 --> 00:31:01,765 If we have two, we have four states. 587 00:31:01,765 --> 00:31:05,130 If we have three of those, we have eight states. 588 00:31:05,130 --> 00:31:09,090 So you can see that if we have n of these neurons 589 00:31:09,090 --> 00:31:13,620 with recurrent excitation with a lambda of greater than one, 590 00:31:13,620 --> 00:31:15,660 we have 2 to the n possible states 591 00:31:15,660 --> 00:31:18,840 that that system can be in, OK? 592 00:31:18,840 --> 00:31:23,920 So I don't know exactly how many neurons are in CA3. 593 00:31:23,920 --> 00:31:27,298 It has to be several million, maybe 10 million. 594 00:31:27,298 --> 00:31:28,590 We don't know the exact number. 595 00:31:28,590 --> 00:31:33,500 But 2 to that is a lot of possible states, right? 596 00:31:36,750 --> 00:31:40,120 So the problem is that-- 597 00:31:40,120 --> 00:31:44,490 so let's think about how this thing acts as a memory. 598 00:31:44,490 --> 00:31:47,940 So it turns out that this little device that we've built here 599 00:31:47,940 --> 00:31:52,230 is actually a lot like a computer memory. 600 00:31:52,230 --> 00:31:58,185 It's like a register, where we can write a value. 601 00:32:00,970 --> 00:32:04,830 So we can write in here a 1, minus 1, 1. 602 00:32:04,830 --> 00:32:08,350 And as long as we leave that network alone, 603 00:32:08,350 --> 00:32:10,030 it will store that value. 604 00:32:12,560 --> 00:32:17,300 Or we can write a 1, 1, 1, and it will store that value. 605 00:32:17,300 --> 00:32:19,900 But that's not really what we mean when 606 00:32:19,900 --> 00:32:21,880 we talk about memories, right? 607 00:32:21,880 --> 00:32:27,350 We have a memory of meeting somebody for lunch yesterday, 608 00:32:27,350 --> 00:32:27,850 right? 609 00:32:27,850 --> 00:32:35,980 That is a particular configuration of sensory inputs 610 00:32:35,980 --> 00:32:37,150 that we experienced. 611 00:32:40,770 --> 00:32:43,760 So the other way to think about this is this kind of network 612 00:32:43,760 --> 00:32:45,560 is just a short-term memory. 613 00:32:45,560 --> 00:32:47,270 We can program in some values-- 614 00:32:47,270 --> 00:32:48,710 1, 1, 1. 615 00:32:48,710 --> 00:32:53,190 But if we were to turn the activity of these neurons off, 616 00:32:53,190 --> 00:32:56,010 we'd erase the memory, right? 617 00:32:56,010 --> 00:32:59,020 How do we build into this network 618 00:32:59,020 --> 00:33:01,710 a long-term memory, something that we 619 00:33:01,710 --> 00:33:05,430 can turn all these neurons off and then 620 00:33:05,430 --> 00:33:09,990 the network sort of goes back into the remembered state? 621 00:33:09,990 --> 00:33:12,510 You do that by building connections 622 00:33:12,510 --> 00:33:16,470 between these neurons, such that only some 623 00:33:16,470 --> 00:33:19,560 of these possible states are actually 624 00:33:19,560 --> 00:33:21,267 stable states, all right? 625 00:33:21,267 --> 00:33:22,850 So let me give you an example of this. 626 00:33:22,850 --> 00:33:26,410 So if you have a whole bunch of neurons-- 627 00:33:26,410 --> 00:33:27,170 n neurons. 628 00:33:27,170 --> 00:33:29,545 You've got 2 to the n possible states 629 00:33:29,545 --> 00:33:33,440 that that network can sit in. 630 00:33:33,440 --> 00:33:38,530 What we want is for only some of those 631 00:33:38,530 --> 00:33:41,280 to actually be stable states of the system. 632 00:33:44,280 --> 00:33:48,050 So, for example, when we wake up in the morning 633 00:33:48,050 --> 00:33:56,040 and we see the dresser or maybe the nightstand next to the bed, 634 00:33:56,040 --> 00:34:00,140 we want to remember that's our bedroom. 635 00:34:00,140 --> 00:34:03,950 We want that to be a particular configuration of inputs 636 00:34:03,950 --> 00:34:07,120 that we recall, right? 637 00:34:07,120 --> 00:34:10,900 So what you want is you want a set of neurons 638 00:34:10,900 --> 00:34:14,949 that have particular states that the system evolves 639 00:34:14,949 --> 00:34:20,780 toward that are stable states of the system. 640 00:34:20,780 --> 00:34:22,449 So the way you do that is you take 641 00:34:22,449 --> 00:34:27,290 this network with recurrent autapses 642 00:34:27,290 --> 00:34:30,949 and you build cross-connections between them that 643 00:34:30,949 --> 00:34:35,210 make particular of those possible states 644 00:34:35,210 --> 00:34:36,860 actual stable states of the system. 645 00:34:36,860 --> 00:34:40,830 We want to restrict the number of stable states in the system. 646 00:34:40,830 --> 00:34:43,170 So take a look at this network here. 647 00:34:43,170 --> 00:34:46,070 So here we have two neurons. 648 00:34:46,070 --> 00:34:50,659 You know that if you had autapses between these-- 649 00:34:50,659 --> 00:34:52,639 of these neurons to themselves, there 650 00:34:52,639 --> 00:34:55,310 would be four possible stable states. 651 00:34:55,310 --> 00:34:58,790 But if we now build excitatory cross-connections 652 00:34:58,790 --> 00:35:01,700 between those neurons, two of those states 653 00:35:01,700 --> 00:35:04,880 actually are no longer stable states. 654 00:35:04,880 --> 00:35:07,330 They become unstable. 655 00:35:07,330 --> 00:35:12,410 And only these two remain stable states of this system, 656 00:35:12,410 --> 00:35:13,550 remain attractors. 657 00:35:18,550 --> 00:35:22,120 If we put inhibitory connections between those neurons, 658 00:35:22,120 --> 00:35:25,690 then we can make these two states the attractors 659 00:35:25,690 --> 00:35:29,740 of the system, OK? 660 00:35:29,740 --> 00:35:30,665 All right. 661 00:35:30,665 --> 00:35:31,540 Does that make sense? 662 00:35:37,460 --> 00:35:40,700 All right, so let's actually flesh out 663 00:35:40,700 --> 00:35:45,560 the mathematics of how you take a network of neurons 664 00:35:45,560 --> 00:35:49,070 and program it to have particular states 665 00:35:49,070 --> 00:35:51,920 that are tractors of the system, all right? 666 00:35:51,920 --> 00:35:55,760 So we've been using this kind of dynamical equation. 667 00:35:55,760 --> 00:35:57,920 We're going to simplify that. 668 00:35:57,920 --> 00:36:01,130 We're going to follow the construction 669 00:36:01,130 --> 00:36:03,110 that John Hopfield used when he analyzed 670 00:36:03,110 --> 00:36:05,930 these recurrent networks. 671 00:36:05,930 --> 00:36:11,270 And instead of writing down a continuous update so 672 00:36:11,270 --> 00:36:14,480 that we update the-- in the formulation we've been using, 673 00:36:14,480 --> 00:36:16,190 we update the firing rate of our neuron 674 00:36:16,190 --> 00:36:18,710 using this differential equation. 675 00:36:18,710 --> 00:36:21,890 We're going to simplify it by just writing down 676 00:36:21,890 --> 00:36:25,770 the state of the network at time t plus 1. 677 00:36:25,770 --> 00:36:27,920 That's a function of the state of the network 678 00:36:27,920 --> 00:36:30,540 of the previous time step. 679 00:36:30,540 --> 00:36:33,060 So we're going to discretize time. 680 00:36:33,060 --> 00:36:35,130 We're going to say v, the state of the network, 681 00:36:35,130 --> 00:36:37,430 the firing rates of all the neurons at time t plus 1, 682 00:36:37,430 --> 00:36:41,080 is just a function of a weight matrix 683 00:36:41,080 --> 00:36:45,210 that connects all the neurons times the state of the system, 684 00:36:45,210 --> 00:36:47,310 times the firing rate vector. 685 00:36:47,310 --> 00:36:50,970 And then this can also have an input into it, all right? 686 00:36:54,480 --> 00:36:57,190 All right. 687 00:36:57,190 --> 00:37:00,190 And here, I'm just writing out exactly what that matrix 688 00:37:00,190 --> 00:37:01,510 multiplication looks like. 689 00:37:01,510 --> 00:37:06,610 It's the state of the i-th [vector?] after we update 690 00:37:06,610 --> 00:37:12,020 the state of the network is just a sum over all of the different 691 00:37:12,020 --> 00:37:14,950 inputs coming from all of the other neurons, 692 00:37:14,950 --> 00:37:18,920 all the j other neurons. 693 00:37:18,920 --> 00:37:22,420 And we're going to simplify our neuronal activation 694 00:37:22,420 --> 00:37:25,850 function to just make it into a binary threshold neuron. 695 00:37:25,850 --> 00:37:29,590 So if the input is positive, then the firing rate of neuron 696 00:37:29,590 --> 00:37:30,418 will be positive. 697 00:37:30,418 --> 00:37:32,710 If the input is negative, the firing rate of the neuron 698 00:37:32,710 --> 00:37:34,460 will be negative. 699 00:37:34,460 --> 00:37:35,200 All right? 700 00:37:35,200 --> 00:37:37,780 And that's the sine function. 701 00:37:37,780 --> 00:37:40,240 It's 1 if x is greater than 0 and minus 1 702 00:37:40,240 --> 00:37:43,498 if x is less than or equal to 0. 703 00:37:43,498 --> 00:37:47,770 All right, so the goal is to build a network that 704 00:37:47,770 --> 00:37:51,910 can store any memory we want, any pattern we want, 705 00:37:51,910 --> 00:37:54,220 and turn that into a stable state. 706 00:37:54,220 --> 00:37:57,160 So we're going to build a network that 707 00:37:57,160 --> 00:38:01,630 will evolve toward a particular pattern that we want. 708 00:38:01,630 --> 00:38:05,860 And xi is just a pattern of ones and minus ones 709 00:38:05,860 --> 00:38:08,800 that describes that memory that we're building 710 00:38:08,800 --> 00:38:12,100 into the network, all right? 711 00:38:12,100 --> 00:38:16,095 So xi is just a one or minus one for every neuron 712 00:38:16,095 --> 00:38:16,720 in the network. 713 00:38:16,720 --> 00:38:19,960 So xi i is one or minus one for the i-th neuron. 714 00:38:22,940 --> 00:38:27,530 Now, we want xi to be an attractor, right? 715 00:38:27,530 --> 00:38:30,600 We want to build a network such that xi is an attractor. 716 00:38:30,600 --> 00:38:32,930 And what that means is that-- 717 00:38:32,930 --> 00:38:35,180 what does building a network mean? 718 00:38:35,180 --> 00:38:39,830 When we say build a network, what are we actually doing? 719 00:38:39,830 --> 00:38:44,822 What is it here that we're actually trying to decide? 720 00:38:44,822 --> 00:38:46,060 AUDIENCE: The seminal roots. 721 00:38:46,060 --> 00:38:47,547 MICHALE FEE: Yeah, which is? 722 00:38:47,547 --> 00:38:48,770 AUDIENCE: Like the matrix M. 723 00:38:48,770 --> 00:38:49,930 MICHALE FEE: The M, right. 724 00:38:49,930 --> 00:38:52,300 So when I say build a network that does this, 725 00:38:52,300 --> 00:38:57,210 I mean choose a set of M's that has this property. 726 00:38:57,210 --> 00:39:02,270 So what we want is we want to find a weight matrix M 727 00:39:02,270 --> 00:39:06,690 such that if the network is in a stable state, 728 00:39:06,690 --> 00:39:09,730 is in this desired state, that when 729 00:39:09,730 --> 00:39:13,630 we multiply that state times the matrix M 730 00:39:13,630 --> 00:39:17,720 and we take the sine of that sum, 731 00:39:17,720 --> 00:39:19,440 you're going to get the same state back. 732 00:39:19,440 --> 00:39:21,470 In other words, you start the network in this state, 733 00:39:21,470 --> 00:39:23,095 it's going to end up in the same state. 734 00:39:23,095 --> 00:39:27,070 That's what it means to have an attractor, OK? 735 00:39:27,070 --> 00:39:31,390 That's what it means to say that it's a stable state. 736 00:39:31,390 --> 00:39:33,535 OK, so we're going to try a particular matrix. 737 00:39:36,130 --> 00:39:38,970 And I'm going to describe what this actually 738 00:39:38,970 --> 00:39:41,080 looks like in more detail. 739 00:39:41,080 --> 00:39:44,820 But the matrix that programs a pattern 740 00:39:44,820 --> 00:39:48,180 xi into the network as an attractor 741 00:39:48,180 --> 00:39:50,170 is this weight matrix right here. 742 00:39:50,170 --> 00:39:53,790 So if we have a pattern xi, our weight matrix 743 00:39:53,790 --> 00:39:58,370 is some constant times the outer product 744 00:39:58,370 --> 00:40:01,170 of that pattern with itself. 745 00:40:01,170 --> 00:40:03,060 I'm going to explain what that means. 746 00:40:03,060 --> 00:40:08,060 What that means is that if neuron i and neuron 747 00:40:08,060 --> 00:40:12,540 j are both active in this pattern, 748 00:40:12,540 --> 00:40:16,860 both have a firing rate of one, then 749 00:40:16,860 --> 00:40:19,380 those two neurons are going to be connected to each other, 750 00:40:19,380 --> 00:40:19,880 right? 751 00:40:19,880 --> 00:40:23,880 They're going to have a connection between them that 752 00:40:23,880 --> 00:40:27,060 has a value of one, or alpha. 753 00:40:27,060 --> 00:40:30,340 If one of those neurons has a firing rate of one 754 00:40:30,340 --> 00:40:33,790 and the other neuron has a firing rate of zero, 755 00:40:33,790 --> 00:40:35,885 then what weight do we want between them? 756 00:40:35,885 --> 00:40:37,510 If one of them has a firing rate of one 757 00:40:37,510 --> 00:40:39,385 and the other has a firing rate of minus one, 758 00:40:39,385 --> 00:40:41,620 the strength of the connection we want between them 759 00:40:41,620 --> 00:40:42,385 is minus one. 760 00:40:42,385 --> 00:40:45,370 So if one neuron is active and another neuron is active, 761 00:40:45,370 --> 00:40:47,770 we want them to excite each other to maintain 762 00:40:47,770 --> 00:40:49,480 that as a stable state. 763 00:40:49,480 --> 00:40:52,510 If one neuron is plus and the other one is minus, 764 00:40:52,510 --> 00:40:54,970 we want them to inhibit each other, 765 00:40:54,970 --> 00:40:58,490 because that will make that configuration stable. 766 00:40:58,490 --> 00:41:01,660 OK, notice that's a symmetric matrix. 767 00:41:01,660 --> 00:41:05,430 So let's actually take our dynamical equation that 768 00:41:05,430 --> 00:41:07,830 says how we go from the state at time t 769 00:41:07,830 --> 00:41:13,380 to the state of time t plus 1 and put in this weight matrix 770 00:41:13,380 --> 00:41:20,862 and see whether this pattern xi is actually a stable state. 771 00:41:20,862 --> 00:41:21,570 So let's do that, 772 00:41:21,570 --> 00:41:25,770 Let's take this M and stick it in there, substitute it in. 773 00:41:25,770 --> 00:41:31,140 Notice this is a sum over j, so we can pull the xi i out. 774 00:41:31,140 --> 00:41:36,300 And now, you see that v at t plus 1 is this. 775 00:41:36,300 --> 00:41:40,800 And it's the sine of a times xi i times 776 00:41:40,800 --> 00:41:43,550 the sum of j of xi j, xi k. 777 00:41:43,550 --> 00:41:46,320 Now, what is that? 778 00:41:46,320 --> 00:41:47,690 Any idea what that is? 779 00:41:47,690 --> 00:41:50,180 So the elements of xi are what? 780 00:41:50,180 --> 00:41:54,090 They're just ones or minus ones. 781 00:41:54,090 --> 00:42:00,320 So xi j times xi j has to be? 782 00:42:00,320 --> 00:42:01,002 AUDIENCE: One. 783 00:42:01,002 --> 00:42:01,710 MICHALE FEE: One. 784 00:42:01,710 --> 00:42:05,340 And we're summing over n neurons. 785 00:42:05,340 --> 00:42:12,920 So this sum has to have a value N. So 786 00:42:12,920 --> 00:42:15,540 you can see that the state at time t plus 1-- 787 00:42:15,540 --> 00:42:21,830 if we start to network in this stored state, 788 00:42:21,830 --> 00:42:25,720 it's just this-- sine of a N xi. 789 00:42:25,720 --> 00:42:28,010 But a is positive. 790 00:42:28,010 --> 00:42:31,500 N is just a positive integer, number of neurons. 791 00:42:31,500 --> 00:42:35,650 So this equal xi. 792 00:42:35,650 --> 00:42:38,560 So if we have this weight matrix, 793 00:42:38,560 --> 00:42:42,610 we start to network in that stored state, 794 00:42:42,610 --> 00:42:45,590 the state at the next time step will be the same state. 795 00:42:45,590 --> 00:42:49,830 So it's a stable fixed point. 796 00:42:49,830 --> 00:42:52,790 All right, so let's just go through an example. 797 00:42:52,790 --> 00:42:57,800 That is the prescription for programming a memory 798 00:42:57,800 --> 00:42:59,930 into a Hopfield network, OK? 799 00:42:59,930 --> 00:43:01,730 And notice that it's just-- 800 00:43:01,730 --> 00:43:05,090 it's essentially a Hebbian learning rule. 801 00:43:05,090 --> 00:43:07,820 So the way you do this is you activate the neurons 802 00:43:07,820 --> 00:43:12,370 with a particular pattern, and any two neurons that are active 803 00:43:12,370 --> 00:43:16,840 together form a positive excitatory connection 804 00:43:16,840 --> 00:43:18,040 between them. 805 00:43:18,040 --> 00:43:20,270 Any two neurons where one is positive 806 00:43:20,270 --> 00:43:25,510 and the other is negative form a symmetric inhibitory 807 00:43:25,510 --> 00:43:27,010 connection, all right? 808 00:43:34,853 --> 00:43:36,770 All right, so let's take a particular example. 809 00:43:36,770 --> 00:43:40,280 Let's make a three-neuron network that 810 00:43:40,280 --> 00:43:43,490 stores a pattern 1, 1, minus 1. 811 00:43:43,490 --> 00:43:46,340 And again, the notation here is xi, xi transpose. 812 00:43:46,340 --> 00:43:48,830 That's an outer product, just like you 813 00:43:48,830 --> 00:43:56,990 use to compute the covariance matrix of a data matrix. 814 00:43:56,990 --> 00:44:00,530 So there's a pattern we're going to program in. 815 00:44:00,530 --> 00:44:03,410 The weight matrix is xi, xi transpose, 816 00:44:03,410 --> 00:44:07,650 but it's 1, 1, minus 1 times 1, 1, minus 1. 817 00:44:07,650 --> 00:44:09,650 You can see that's going to give you this matrix 818 00:44:09,650 --> 00:44:10,490 here, all right? 819 00:44:10,490 --> 00:44:12,800 So that element there is 1 times 1. 820 00:44:12,800 --> 00:44:14,010 That element there. 821 00:44:14,010 --> 00:44:16,860 So here are two neurons. 822 00:44:16,860 --> 00:44:22,170 These two neurons storing this pattern, these two neurons-- 823 00:44:22,170 --> 00:44:26,310 sorry, this neuron has a firing rate of minus one. 824 00:44:26,310 --> 00:44:30,840 So the connection between that neuron and itself 825 00:44:30,840 --> 00:44:34,220 is a one, right? 826 00:44:34,220 --> 00:44:36,660 It's just the product of that times that. 827 00:44:36,660 --> 00:44:41,480 All right any questions about how we got this weight matrix? 828 00:44:41,480 --> 00:44:45,040 I think it's pretty straightforward. 829 00:44:45,040 --> 00:44:47,500 So is that a stable point? 830 00:44:47,500 --> 00:44:49,060 Let's just multiply it out. 831 00:44:49,060 --> 00:44:53,800 We take this vector and multiply it by this matrix. 832 00:44:53,800 --> 00:44:55,540 There's our stored pattern. 833 00:44:55,540 --> 00:44:58,200 There's our matrix that stores that pattern. 834 00:44:58,200 --> 00:44:59,950 And we're just going to multiply this out. 835 00:44:59,950 --> 00:45:04,600 You can see that 1 times 1 plus 1 times 1 836 00:45:04,600 --> 00:45:07,120 plus minus 1 times minus 1 is 3. 837 00:45:07,120 --> 00:45:09,635 You just do that for each of the neurons. 838 00:45:12,670 --> 00:45:13,970 Take the sine of that. 839 00:45:13,970 --> 00:45:16,345 And you can see that that's just 1, 1, minus 1. 840 00:45:16,345 --> 00:45:20,170 So 1, 1, minus 1 is a stable fixed point. 841 00:45:20,170 --> 00:45:22,690 Now let's see if it's actually an attractor. 842 00:45:22,690 --> 00:45:26,380 So when a state is an attractor, what that means is 843 00:45:26,380 --> 00:45:28,360 if we start to network at a state that's 844 00:45:28,360 --> 00:45:32,530 a little bit different from that and advance the network one 845 00:45:32,530 --> 00:45:36,490 time step, it will converge toward the attractor. 846 00:45:36,490 --> 00:45:41,590 So into our network that stores this pattern 1, 1, minus 1, 847 00:45:41,590 --> 00:45:45,590 let's put in a different pattern and see what happens. 848 00:45:45,590 --> 00:45:47,470 So we're going to take that weight matrix, 849 00:45:47,470 --> 00:45:52,750 multiply it by this initial state, multiply it out, 850 00:45:52,750 --> 00:45:55,570 and you can see that next state is 851 00:45:55,570 --> 00:46:00,040 going to be the sine of 3, 3, minus 3. 852 00:46:00,040 --> 00:46:05,440 And one time step advanced, the network is now in the state 853 00:46:05,440 --> 00:46:07,570 that we've programmed in. 854 00:46:07,570 --> 00:46:10,530 Does that make sense? 855 00:46:10,530 --> 00:46:16,272 So that state is a stable fixed point and it's an attractor. 856 00:46:16,272 --> 00:46:18,230 I'm just going to go through this very quickly. 857 00:46:18,230 --> 00:46:22,760 I'm just going to prove that xi is an attractor of the network 858 00:46:22,760 --> 00:46:27,470 if we write down the network as the outer product of this. 859 00:46:27,470 --> 00:46:30,470 The matrix elements are the outer product 860 00:46:30,470 --> 00:46:32,360 of the stored state, OK? 861 00:46:32,360 --> 00:46:34,010 So what we're going to do is we're 862 00:46:34,010 --> 00:46:37,770 going to calculate the total input onto the i-th neuron 863 00:46:37,770 --> 00:46:43,530 if we start from an arbitrary state, v. So k 864 00:46:43,530 --> 00:46:47,570 is the input to all the neurons, right? 865 00:46:47,570 --> 00:46:52,880 And it's just that matrix times the initial state. 866 00:46:52,880 --> 00:46:55,780 So v j is the firing rate of the j-th neuron, 867 00:46:55,780 --> 00:47:00,520 and k is just M times v. That's the pattern of inputs 868 00:47:00,520 --> 00:47:02,350 to all of our neurons. 869 00:47:02,350 --> 00:47:04,610 So what is that? k equals-- 870 00:47:04,610 --> 00:47:06,490 we're just going to put this weight matrix 871 00:47:06,490 --> 00:47:10,300 into this equation, all right? 872 00:47:10,300 --> 00:47:13,300 We can pull the xi i outside of the sum, 873 00:47:13,300 --> 00:47:15,070 because it doesn't depend on j. 874 00:47:15,070 --> 00:47:17,300 The sum is over j. 875 00:47:17,300 --> 00:47:20,480 Now let's just write out this sum, OK? 876 00:47:20,480 --> 00:47:22,670 Now, you can see that if you start out 877 00:47:22,670 --> 00:47:27,470 with an initial state that has some number of neurons that 878 00:47:27,470 --> 00:47:32,570 have the correct sign that are already overlapping 879 00:47:32,570 --> 00:47:35,780 with the memorized state and some number of neurons 880 00:47:35,780 --> 00:47:37,940 in that initial state don't overlap 881 00:47:37,940 --> 00:47:40,640 with the memorized state, we can write out 882 00:47:40,640 --> 00:47:42,770 this sum as two terms. 883 00:47:42,770 --> 00:47:47,000 We can write it as a sum over some of the neurons that 884 00:47:47,000 --> 00:47:51,370 are already in the correct state and a sum over neurons that 885 00:47:51,370 --> 00:47:53,080 are not in the correct state. 886 00:47:56,280 --> 00:47:59,490 So if these neurons in that initial state 887 00:47:59,490 --> 00:48:02,790 have the right sign, that means these two have the same sign. 888 00:48:02,790 --> 00:48:08,040 And so the sum over xi j vj for neurons 889 00:48:08,040 --> 00:48:10,350 where v has the right sign is just 890 00:48:10,350 --> 00:48:13,680 the number of neurons that has the correct sign. 891 00:48:13,680 --> 00:48:16,320 And this sum over incorrect neurons 892 00:48:16,320 --> 00:48:20,010 means these neurons have the opposite sign of the desired 893 00:48:20,010 --> 00:48:20,730 memory. 894 00:48:20,730 --> 00:48:24,540 And so those will be one, and those will be minus one. 895 00:48:24,540 --> 00:48:26,820 Or those will be minus one, and those will be one. 896 00:48:26,820 --> 00:48:31,150 And so this will be minus the number of incorrect neurons. 897 00:48:31,150 --> 00:48:33,660 So you can see that the input of the neuron 898 00:48:33,660 --> 00:48:38,790 will have the right sign if the number of correct neurons 899 00:48:38,790 --> 00:48:43,060 is more than the number of incorrect neurons, all right? 900 00:48:43,060 --> 00:48:46,810 So what that means is that if you program a pattern 901 00:48:46,810 --> 00:48:49,360 into this network and then I drive 902 00:48:49,360 --> 00:48:58,050 an input into the network, where most of the inputs drive-- 903 00:48:58,050 --> 00:49:05,570 if the input drives most of the neurons with the right sign, 904 00:49:05,570 --> 00:49:10,480 then the inputs will cause the network 905 00:49:10,480 --> 00:49:15,580 to evolve toward the memorized pattern in the next timestamp. 906 00:49:15,580 --> 00:49:17,830 OK, so let me say that again, because I felt like that 907 00:49:17,830 --> 00:49:19,870 didn't come out very clearly. 908 00:49:19,870 --> 00:49:23,050 We program a pattern into our network. 909 00:49:23,050 --> 00:49:27,160 If we start to network at some-- 910 00:49:27,160 --> 00:49:28,300 let's say at zero. 911 00:49:28,300 --> 00:49:31,630 And then we put in a pattern into the network such 912 00:49:31,630 --> 00:49:37,150 that just the majority of the neurons 913 00:49:37,150 --> 00:49:42,490 are activated in a way that looks like the stored pattern, 914 00:49:42,490 --> 00:49:44,920 then in the next time step, all of the neurons 915 00:49:44,920 --> 00:49:46,973 will have this stored pattern. 916 00:49:46,973 --> 00:49:48,640 So let me show you what that looks like. 917 00:49:52,210 --> 00:49:54,280 Let me actually go ahead and show you-- 918 00:50:00,930 --> 00:50:03,270 OK, so here's an example of that. 919 00:50:03,270 --> 00:50:06,080 So you can use Hopfield networks to store 920 00:50:06,080 --> 00:50:08,990 many different kinds of things, including images, all right? 921 00:50:08,990 --> 00:50:11,630 So this is a network where each pixel 922 00:50:11,630 --> 00:50:15,610 is being represented by a neuron in a Hopfield network. 923 00:50:15,610 --> 00:50:20,330 And a particular image was stored in that network 924 00:50:20,330 --> 00:50:24,800 by setting up the pattern of synaptic weights 925 00:50:24,800 --> 00:50:29,900 just using that xi, xi transpose learning rule for the weight 926 00:50:29,900 --> 00:50:31,700 matrix M, OK? 927 00:50:31,700 --> 00:50:34,370 Now, what you can do is you can [INAUDIBLE] that network 928 00:50:34,370 --> 00:50:38,960 from a random initial condition. 929 00:50:38,960 --> 00:50:43,160 And then let the network evolve over time, all right? 930 00:50:43,160 --> 00:50:46,880 And what you see is that the network converges 931 00:50:46,880 --> 00:50:55,155 toward the pattern that was stored in the synaptic [?], OK? 932 00:50:55,155 --> 00:50:56,030 Does that make sense? 933 00:51:01,190 --> 00:51:03,520 Got that? 934 00:51:03,520 --> 00:51:11,226 So, basically, as long as that initial pattern 935 00:51:11,226 --> 00:51:14,820 has some overlap with the stored pattern, 936 00:51:14,820 --> 00:51:17,120 the network will evolve toward the stored pattern. 937 00:51:23,710 --> 00:51:26,080 All right, so let me define a little bit 938 00:51:26,080 --> 00:51:29,975 better what we mean by the energy landscape 939 00:51:29,975 --> 00:51:31,225 and how it's actually defined. 940 00:51:33,770 --> 00:51:37,360 OK, so you remember that if we start our network 941 00:51:37,360 --> 00:51:41,870 in a particular pattern v, the recurrent connections 942 00:51:41,870 --> 00:51:47,840 will drive inputs into all the neurons in the network. 943 00:51:47,840 --> 00:51:49,700 And those inputs will then determine 944 00:51:49,700 --> 00:51:53,280 the pattern of activity at the next time step. 945 00:51:53,280 --> 00:51:58,030 So if we have a state of the network v, 946 00:51:58,030 --> 00:52:02,110 the inputs to the network, to all the neurons in the network, 947 00:52:02,110 --> 00:52:04,330 from the currently active neurons 948 00:52:04,330 --> 00:52:09,400 is given by the connection matrix times v. 949 00:52:09,400 --> 00:52:12,670 So we can just write that out as a sum like this. 950 00:52:12,670 --> 00:52:19,490 So you define the energy of the network as the dot product-- 951 00:52:19,490 --> 00:52:21,830 basically, the amount of overlap-- 952 00:52:21,830 --> 00:52:26,480 between the current state of the network 953 00:52:26,480 --> 00:52:31,200 and the inputs to all of the neurons 954 00:52:31,200 --> 00:52:35,270 that drive the activity in the next step, OK? 955 00:52:35,270 --> 00:52:37,610 And the energy is minus, OK? 956 00:52:37,610 --> 00:52:42,110 So what that means is if the network is in a state that 957 00:52:42,110 --> 00:52:47,450 has a big overlap with the pattern of inputs to all 958 00:52:47,450 --> 00:52:50,300 the other neurons, then the energy will 959 00:52:50,300 --> 00:52:51,860 be very negative, right? 960 00:52:51,860 --> 00:52:55,940 And remember, the system likes to evolve toward low energies. 961 00:52:55,940 --> 00:52:58,490 In physics, you have a ball on a hill. 962 00:52:58,490 --> 00:53:04,150 It rolls downhill, right, to lower gravitational energies. 963 00:53:04,150 --> 00:53:06,820 So you start the ball anywhere on the hill, 964 00:53:06,820 --> 00:53:08,080 and it will roll downhill. 965 00:53:08,080 --> 00:53:10,000 So these networks do the same thing. 966 00:53:10,000 --> 00:53:14,050 They evolve downward on this energy surface. 967 00:53:14,050 --> 00:53:18,280 They evolve towards states that have 968 00:53:18,280 --> 00:53:23,440 a high overlap with the inputs that drive the next state. 969 00:53:23,440 --> 00:53:24,730 Does that make sense? 970 00:53:24,730 --> 00:53:31,600 So if you're in a state where the pattern right now 971 00:53:31,600 --> 00:53:33,910 has a high overlap with what the pattern is going 972 00:53:33,910 --> 00:53:37,540 to be in the next time step, then you're in an attractor, 973 00:53:37,540 --> 00:53:38,040 right? 974 00:53:41,950 --> 00:53:45,050 OK, so it looks like that. 975 00:53:45,050 --> 00:53:48,830 So this energy is just negative of the overlap 976 00:53:48,830 --> 00:53:51,860 of the current state of the network with the pattern 977 00:53:51,860 --> 00:53:53,240 of inputs to all the neurons. 978 00:53:53,240 --> 00:53:54,421 Yes, Rebecca? 979 00:53:54,421 --> 00:53:57,790 AUDIENCE: So [INAUDIBLE] to say [INAUDIBLE] with the weight 980 00:53:57,790 --> 00:54:00,328 matrix, since that's sort of the goal of the next time step, 981 00:54:00,328 --> 00:54:02,820 and it will evolve towards the matrix [INAUDIBLE]?? 982 00:54:02,820 --> 00:54:03,570 MICHALE FEE: Yeah. 983 00:54:03,570 --> 00:54:07,470 So the only difference is that the state of the network 984 00:54:07,470 --> 00:54:09,750 is this vector, right? 985 00:54:09,750 --> 00:54:14,460 And the weight matrix tells us how that state will drive input 986 00:54:14,460 --> 00:54:16,980 into all the other neurons. 987 00:54:16,980 --> 00:54:23,050 And so if you're in a state that drives a pattern of inputs 988 00:54:23,050 --> 00:54:27,160 to all the neurons that looks exactly like the current state, 989 00:54:27,160 --> 00:54:31,190 then you're going to stay in that state, right? 990 00:54:31,190 --> 00:54:34,660 And so the energy is just defined as that dot product, 991 00:54:34,660 --> 00:54:38,140 the overlap of the current state, or the state 992 00:54:38,140 --> 00:54:40,150 that you're calculating the energy of, 993 00:54:40,150 --> 00:54:44,515 and the inputs to the network in the next time step. 994 00:54:44,515 --> 00:54:46,640 All right, so let me show you what that looks like. 995 00:54:49,180 --> 00:54:52,130 And so the energy is lowest, current state 996 00:54:52,130 --> 00:54:54,140 has a high overlap with the synaptic drive 997 00:54:54,140 --> 00:54:55,820 to the next step. 998 00:54:55,820 --> 00:54:59,190 So let's just take a look at this particular network here. 999 00:54:59,190 --> 00:55:01,410 I've rewritten this dot product as-- 1000 00:55:01,410 --> 00:55:04,700 so k is just M times v. This dot product 1001 00:55:04,700 --> 00:55:09,830 can just be written as v transpose times Mv. 1002 00:55:09,830 --> 00:55:11,490 So that's the energy. 1003 00:55:11,490 --> 00:55:15,400 Let's take a look at this matrix, this network here-- 1004 00:55:15,400 --> 00:55:17,030 0, minus 2, minus 2, 0. 1005 00:55:17,030 --> 00:55:18,980 So it's this mutually inhibitory network. 1006 00:55:18,980 --> 00:55:21,500 You know that that inhibitory network 1007 00:55:21,500 --> 00:55:30,220 has attractors that are here at minus 1, 1 and 1, minus 1. 1008 00:55:30,220 --> 00:55:31,990 So let's actually calculate the energy. 1009 00:55:31,990 --> 00:55:35,130 So you can actually take these states-- 1010 00:55:35,130 --> 00:55:38,670 1, minus 1-- multiply it by that M, 1011 00:55:38,670 --> 00:55:41,325 and then take the dot product with 1, minus 1. 1012 00:55:41,325 --> 00:55:43,920 And do that for each one of those states 1013 00:55:43,920 --> 00:55:45,080 and write down the energy. 1014 00:55:45,080 --> 00:55:49,160 You can see that the energy here is minus 1. 1015 00:55:49,160 --> 00:55:53,100 The energy here is minus 1, and the energy here is 0. 1016 00:55:53,100 --> 00:55:57,240 So if you start the network here, at an energy zero, 1017 00:55:57,240 --> 00:56:01,310 it's going to roll downhill to this state. 1018 00:56:05,920 --> 00:56:07,910 Or it can roll downhill to this state, 1019 00:56:07,910 --> 00:56:14,030 depending on the initial condition, OK? 1020 00:56:18,680 --> 00:56:24,500 So you can also think about the energy as a function of firing 1021 00:56:24,500 --> 00:56:25,580 rates continuously. 1022 00:56:25,580 --> 00:56:28,730 You can calculate that energy, not just for these points 1023 00:56:28,730 --> 00:56:30,560 on this grid. 1024 00:56:30,560 --> 00:56:33,260 And what you see is that there's basically-- 1025 00:56:33,260 --> 00:56:35,210 in high dimensions, there are sort 1026 00:56:35,210 --> 00:56:39,710 of valleys that describe the attractor 1027 00:56:39,710 --> 00:56:43,610 basin of these different attractors, all right? 1028 00:56:43,610 --> 00:56:48,350 And if you project that energy along an axis like this, 1029 00:56:48,350 --> 00:56:52,550 you can see that you sort of-- 1030 00:56:52,550 --> 00:56:55,340 let's say, take a slice through this energy function. 1031 00:56:55,340 --> 00:56:58,580 You can see that this looks just like the energy 1032 00:56:58,580 --> 00:57:01,190 surface, the energy function, that we described before 1033 00:57:01,190 --> 00:57:06,820 for the 1D factor, the single neuron with two attractors, 1034 00:57:06,820 --> 00:57:07,320 right? 1035 00:57:07,320 --> 00:57:10,980 This corresponds to a valley and a valley 1036 00:57:10,980 --> 00:57:13,200 and a peak between them. 1037 00:57:13,200 --> 00:57:16,470 And then the energy gets big outside of that. 1038 00:57:16,470 --> 00:57:20,115 And questions about that? 1039 00:57:20,115 --> 00:57:21,070 Yes, [INAUDIBLE]. 1040 00:57:21,070 --> 00:57:26,680 AUDIENCE: [INAUDIBLE] vector 1/2 because-- in this case, right? 1041 00:57:26,680 --> 00:57:31,090 MICHALE FEE: That's the general definition, minus 1/2 v dot k. 1042 00:57:37,390 --> 00:57:38,770 It actually doesn't really-- 1043 00:57:38,770 --> 00:57:40,460 this 1/2 doesn't really matter. 1044 00:57:40,460 --> 00:57:45,190 It actually comes out of the derivative of something, 1045 00:57:45,190 --> 00:57:45,930 as I recall. 1046 00:57:45,930 --> 00:57:47,830 But a scaling factor doesn't matter. 1047 00:57:47,830 --> 00:57:51,550 The network always evolves toward a minimum of the energy. 1048 00:57:51,550 --> 00:57:55,060 And so this 1/2 could be anything. 1049 00:57:58,170 --> 00:58:04,680 All right, so the point is that starting the network anywhere 1050 00:58:04,680 --> 00:58:09,300 with a sensory input, the system will evolve toward the nearest 1051 00:58:09,300 --> 00:58:10,580 memory, OK? 1052 00:58:14,530 --> 00:58:15,780 And I already showed you this. 1053 00:58:15,780 --> 00:58:19,230 OK, so now, a very interesting question 1054 00:58:19,230 --> 00:58:23,500 is, how many memories can you actually store in a network? 1055 00:58:23,500 --> 00:58:28,140 And there's a very simple way of calculating the capacity 1056 00:58:28,140 --> 00:58:29,680 of the Hopfield network. 1057 00:58:29,680 --> 00:58:32,540 And I'm just going to show you the outlines of it. 1058 00:58:32,540 --> 00:58:35,220 And that actually gives us some insight 1059 00:58:35,220 --> 00:58:39,150 into what kinds of memories you can store. 1060 00:58:39,150 --> 00:58:42,630 Basically, the idea is that when you store memories 1061 00:58:42,630 --> 00:58:44,850 in a network, you want the different memories 1062 00:58:44,850 --> 00:58:48,267 to be as uncorrelated with each other as possible. 1063 00:58:48,267 --> 00:58:50,100 You don't want to try to store memories that 1064 00:58:50,100 --> 00:58:53,690 are very similar to each other. 1065 00:58:53,690 --> 00:58:57,910 And you'll see why in a second when we look at the map. 1066 00:58:57,910 --> 00:59:00,760 So let's say that we want to store multiple memories 1067 00:59:00,760 --> 00:59:02,510 in our network. 1068 00:59:02,510 --> 00:59:06,100 So instead of just storing one pattern, xi, 1069 00:59:06,100 --> 00:59:09,470 we want to store a bunch of different patterns xi. 1070 00:59:09,470 --> 00:59:12,530 And so let's say we're going to store P different patterns. 1071 00:59:12,530 --> 00:59:15,730 So we have a parameter variable mu. 1072 00:59:15,730 --> 00:59:19,670 An index mu addresses each of the different patterns 1073 00:59:19,670 --> 00:59:20,920 we want to store. 1074 00:59:20,920 --> 00:59:25,730 So we're going to store zero to p patterns, p minus 1 patterns. 1075 00:59:25,730 --> 00:59:28,120 So what we do, the way we do that is 1076 00:59:28,120 --> 00:59:31,330 we compute the contribution to the weight 1077 00:59:31,330 --> 00:59:34,520 make matrix from each of those different patterns. 1078 00:59:34,520 --> 00:59:38,440 So we calculate a weight matrix using the outer product 1079 00:59:38,440 --> 00:59:42,440 for each of the patterns we want to store in the network, 1080 00:59:42,440 --> 00:59:43,130 all right? 1081 00:59:43,130 --> 00:59:46,420 And then we add all of those together. 1082 00:59:46,420 --> 00:59:54,280 We're going to essentially sort of average together the network 1083 00:59:54,280 --> 00:59:59,050 that we would make for each pattern separately. 1084 00:59:59,050 --> 01:00:00,480 Does that makes sense? 1085 01:00:00,480 --> 01:00:05,120 So there is the equation for the weight matrix 1086 01:00:05,120 --> 01:00:10,060 that stores p different patterns in our memory, in our network. 1087 01:00:13,870 --> 01:00:16,690 And that's how we got this kind of network 1088 01:00:16,690 --> 01:00:20,220 here, where we store multiple memories, all right? 1089 01:00:22,920 --> 01:00:25,170 So let me just show you an example of what 1090 01:00:25,170 --> 01:00:26,260 happens when you do that. 1091 01:00:26,260 --> 01:00:28,260 So I found these nice videos online. 1092 01:00:28,260 --> 01:00:33,540 So here is a representation of a network that stores 1093 01:00:33,540 --> 01:00:38,310 a five by five array of pixels. 1094 01:00:38,310 --> 01:00:42,740 And this network was trained on these three different patterns. 1095 01:00:42,740 --> 01:00:45,380 And what this little demo shows is 1096 01:00:45,380 --> 01:00:48,740 that if you start the network from different configurations 1097 01:00:48,740 --> 01:00:52,010 here and then evolve the network-- you start running it. 1098 01:00:52,010 --> 01:00:55,760 That means you run the dynamic update for each neuron 1099 01:00:55,760 --> 01:00:58,010 one at a time, and you can see how 1100 01:00:58,010 --> 01:00:59,380 this system evolves over time. 1101 01:01:03,500 --> 01:01:05,630 So this is a little GUI-based thing. 1102 01:01:05,630 --> 01:01:08,460 You can flip the state and then run it. 1103 01:01:08,460 --> 01:01:13,532 And you can see that if you change those, now it-- 1104 01:01:19,940 --> 01:01:22,680 I think he was trying to make it look like that. 1105 01:01:22,680 --> 01:01:27,390 But when you run it, it actually evolved toward this one. 1106 01:01:33,750 --> 01:01:36,570 He's going to really make it look like that. 1107 01:01:36,570 --> 01:01:40,010 And you can see it evolves toward that one. 1108 01:01:40,010 --> 01:01:42,020 All right, any questions about that? 1109 01:01:42,020 --> 01:01:43,940 You can see it stored three separate memories. 1110 01:01:43,940 --> 01:01:47,300 You've given an input, and the network 1111 01:01:47,300 --> 01:01:51,540 evolves toward whatever memory was closest to the input. 1112 01:01:51,540 --> 01:01:54,140 So that's called a content [INAUDIBLE] memory. 1113 01:01:54,140 --> 01:01:56,660 You can actually recall a memory-- 1114 01:01:56,660 --> 01:02:00,050 not by pointing to an address, like you do in a computer, 1115 01:02:00,050 --> 01:02:02,540 but by putting in something that looks 1116 01:02:02,540 --> 01:02:04,640 a little bit like the memory. 1117 01:02:04,640 --> 01:02:09,080 And then the system evolves right to the memory 1118 01:02:09,080 --> 01:02:12,660 that was closest to the input. 1119 01:02:12,660 --> 01:02:17,010 So it's also called an auto-associative memory. 1120 01:02:17,010 --> 01:02:21,120 It automatically associates with the nearest-- 1121 01:02:21,120 --> 01:02:24,660 with a pattern that's nearest to the input. 1122 01:02:24,660 --> 01:02:26,910 So here's another example. 1123 01:02:26,910 --> 01:02:29,340 It's just kind of more of the same. 1124 01:02:29,340 --> 01:02:32,400 This is a network similar to this. 1125 01:02:32,400 --> 01:02:34,930 Instead of black and white, it's red and purple, 1126 01:02:34,930 --> 01:02:37,770 but it's got a lot more pixels. 1127 01:02:37,770 --> 01:02:41,010 And you'll see the three different images 1128 01:02:41,010 --> 01:02:44,700 that are stored in there-- 1129 01:02:44,700 --> 01:02:48,090 so a face, a world, and a penguin. 1130 01:02:48,090 --> 01:02:51,090 So then what they're doing here is they add noise. 1131 01:02:51,090 --> 01:02:53,580 And then you run the network, and it [AUDIO OUT] 1132 01:02:53,580 --> 01:02:55,540 one of the patterns that you stored in it. 1133 01:03:00,960 --> 01:03:02,200 So here's the penguin. 1134 01:03:02,200 --> 01:03:03,750 Add noise. 1135 01:03:03,750 --> 01:03:06,140 Add a little bit of noise. 1136 01:03:06,140 --> 01:03:10,200 Here, he's coloring it in, I guess, to make it. 1137 01:03:10,200 --> 01:03:12,260 And then you run the network, and it 1138 01:03:12,260 --> 01:03:13,894 remembers the [AUDIO OUT]. 1139 01:03:18,010 --> 01:03:19,710 OK, so that's interesting. 1140 01:03:19,710 --> 01:03:21,930 So he ran it. 1141 01:03:21,930 --> 01:03:23,540 He or she ran the network. 1142 01:03:23,540 --> 01:03:27,350 And you see that it kind of recovered a face, 1143 01:03:27,350 --> 01:03:31,160 but there's some penguin head stuck on top. 1144 01:03:31,160 --> 01:03:32,840 So what goes wrong there? 1145 01:03:32,840 --> 01:03:36,010 Something bad happened, right? 1146 01:03:36,010 --> 01:03:40,490 The network was trained with a face, a globe, and a penguin. 1147 01:03:40,490 --> 01:03:42,920 And you run it most of the time, and it works. 1148 01:03:42,920 --> 01:03:46,280 And then, suddenly, you run it, and it recovers a face 1149 01:03:46,280 --> 01:03:48,110 with a penguin head sticking out of it. 1150 01:03:48,110 --> 01:03:50,390 What happened? 1151 01:03:50,390 --> 01:03:51,640 So we'll explain what happens. 1152 01:03:51,640 --> 01:03:55,540 What happened was that that this network 1153 01:03:55,540 --> 01:03:58,660 was trained in a way that has what's 1154 01:03:58,660 --> 01:04:00,770 called a spurious attractor. 1155 01:04:00,770 --> 01:04:02,860 And that often happens when you train a network 1156 01:04:02,860 --> 01:04:05,170 with too many memories, when you exceed 1157 01:04:05,170 --> 01:04:07,810 the capacity of the network to store memories. 1158 01:04:07,810 --> 01:04:11,510 So let me show you what actually goes wrong mathematically 1159 01:04:11,510 --> 01:04:12,010 there. 1160 01:04:17,970 --> 01:04:20,580 All right, so we're going to do the same analysis 1161 01:04:20,580 --> 01:04:21,330 we did before. 1162 01:04:21,330 --> 01:04:24,030 We're going to take a matrix. 1163 01:04:24,030 --> 01:04:29,130 We're going to build a network that stores multiple memories. 1164 01:04:31,890 --> 01:04:33,887 This was the matrix to build one memory. 1165 01:04:33,887 --> 01:04:34,970 Let's see what I did here. 1166 01:04:39,670 --> 01:04:41,620 So in order for-- 1167 01:04:45,190 --> 01:04:45,820 Yeah. 1168 01:04:45,820 --> 01:04:46,320 Sorry. 1169 01:04:46,320 --> 01:04:48,630 This was the matrix for multiple memories. 1170 01:04:48,630 --> 01:04:50,130 We're summing mu. 1171 01:04:50,130 --> 01:04:53,553 I just didn't write the mu equals 0 to p. 1172 01:04:53,553 --> 01:04:55,470 So we're going to program p different memories 1173 01:04:55,470 --> 01:04:59,700 by summing up this outer product for all the different patterns 1174 01:04:59,700 --> 01:05:02,460 that we're wanting to store, all right? 1175 01:05:02,460 --> 01:05:06,570 We're going to ask whether one of those-- 1176 01:05:06,570 --> 01:05:11,580 under what conditions is one of those patterns, the xi 0, 1177 01:05:11,580 --> 01:05:15,930 actually a stable state of the network? 1178 01:05:15,930 --> 01:05:17,580 So we're going to build a network 1179 01:05:17,580 --> 01:05:19,860 with multiple patterns stored, and we're just 1180 01:05:19,860 --> 01:05:21,870 going to ask a simple question. 1181 01:05:21,870 --> 01:05:28,040 Under what conditions is xi 0 going to evolve to xi 0? 1182 01:05:28,040 --> 01:05:32,540 And if xi 0 evolves toward xi 0, or stays at xi 0, 1183 01:05:32,540 --> 01:05:35,060 then it's a stable point. 1184 01:05:35,060 --> 01:05:36,270 All right, so let's do that. 1185 01:05:36,270 --> 01:05:37,895 We're going to take that weight matrix, 1186 01:05:37,895 --> 01:05:41,240 and we're going to plug-in our multiple memory weight 1187 01:05:41,240 --> 01:05:43,520 matrix, all right? 1188 01:05:43,520 --> 01:05:48,080 You can see that we can pull out the xi 1189 01:05:48,080 --> 01:05:53,050 i out of this sum over j. 1190 01:05:53,050 --> 01:05:56,920 And the next step is we're going to separate this 1191 01:05:56,920 --> 01:06:02,500 into a sum over mu equals zero and a separate sum for mu 1192 01:06:02,500 --> 01:06:04,170 not equal to 0, all right? 1193 01:06:04,170 --> 01:06:08,140 So this is a sum over all the mu's, 1194 01:06:08,140 --> 01:06:10,090 but we're going to pull out the mu zero 1195 01:06:10,090 --> 01:06:13,650 term as a separate sum over j. 1196 01:06:13,650 --> 01:06:15,040 Is that clear? 1197 01:06:15,040 --> 01:06:18,010 Anyway, this is just for fun. 1198 01:06:18,010 --> 01:06:21,310 You don't have to reproduce this, so don't worry. 1199 01:06:24,270 --> 01:06:27,280 So we're going to pull out the mu equals zero term. 1200 01:06:27,280 --> 01:06:28,660 And what does that look like? 1201 01:06:28,660 --> 01:06:35,080 It's xi i0, sum over j of xi j0, xi j0. 1202 01:06:35,080 --> 01:06:37,380 So what is that? 1203 01:06:37,380 --> 01:06:40,260 That's just N, right, the number of neurons. 1204 01:06:40,260 --> 01:06:43,860 We're summing over j equals 1 to N, number of neurons. 1205 01:06:43,860 --> 01:06:45,990 I should add those limits here. 1206 01:06:45,990 --> 01:06:54,220 So you can see that that's N. So this is just sine of xi i0 1207 01:06:54,220 --> 01:06:57,930 plus a bunch of other stuff. 1208 01:06:57,930 --> 01:07:01,950 So you can see right away that if all of this other stuff 1209 01:07:01,950 --> 01:07:05,530 is really small, then this is a fixed point. 1210 01:07:05,530 --> 01:07:08,370 Because if all this stuff is small, 1211 01:07:08,370 --> 01:07:12,270 the system will evolve toward the sine of xi [INAUDIBLE],, 1212 01:07:12,270 --> 01:07:15,000 which is just xi i0. 1213 01:07:15,000 --> 01:07:16,710 So let's take a look at all of this stuff 1214 01:07:16,710 --> 01:07:22,970 and see what can go wrong to make this not small. 1215 01:07:22,970 --> 01:07:25,490 All right, so let's zoom in on this particular term right 1216 01:07:25,490 --> 01:07:25,990 here. 1217 01:07:25,990 --> 01:07:27,170 So what is this? 1218 01:07:27,170 --> 01:07:33,150 This is sum over j, xi mu j, xi 0 j. 1219 01:07:33,150 --> 01:07:34,770 So what is that? 1220 01:07:34,770 --> 01:07:35,940 Anybody know what that is? 1221 01:07:39,030 --> 01:07:40,860 It's a vector operation. 1222 01:07:40,860 --> 01:07:41,870 What is that? 1223 01:07:41,870 --> 01:07:44,870 AUDIENCE: The dot product between one image 1224 01:07:44,870 --> 01:07:45,870 and then zero. 1225 01:07:45,870 --> 01:07:47,110 MICHALE FEE: Exactly. 1226 01:07:47,110 --> 01:07:50,850 It's a dot product between the image that we're asking 1227 01:07:50,850 --> 01:07:54,900 is it a stable fixed point and all the other images 1228 01:07:54,900 --> 01:07:56,830 in the network. 1229 01:07:56,830 --> 01:07:59,045 Sorry, and the mu-th image. 1230 01:08:01,760 --> 01:08:07,130 So what this is saying is that if our image is 1231 01:08:07,130 --> 01:08:10,670 orthogonal to all the other images in the network 1232 01:08:10,670 --> 01:08:14,375 that we've tried to store, then this thing is zero. 1233 01:08:21,810 --> 01:08:23,819 So this is referred to as crosstalk 1234 01:08:23,819 --> 01:08:26,430 between the stored memories. 1235 01:08:26,430 --> 01:08:30,930 So if our pattern, xi 0, is orthogonal to all 1236 01:08:30,930 --> 01:08:33,550 the other patterns, then it will be a fixed point. 1237 01:08:33,550 --> 01:08:37,500 So the capacity of the network, the crosstalk-- 1238 01:08:37,500 --> 01:08:40,290 the capacity of the network depends 1239 01:08:40,290 --> 01:08:43,470 on how much overlap there is between our stored pattern 1240 01:08:43,470 --> 01:08:46,200 and all the other patterns in the network, all right? 1241 01:08:50,757 --> 01:08:52,340 So if all the memories are orthogonal, 1242 01:08:52,340 --> 01:08:53,979 if all the patterns are orthogonal, 1243 01:08:53,979 --> 01:08:57,670 then they're all stable attractors. 1244 01:08:57,670 --> 01:09:01,510 But if one of those memories, xi 1-- let's take xi 1-- 1245 01:09:01,510 --> 01:09:07,000 is close to xi 0, then xi 0 dot xi 1-- 1246 01:09:07,000 --> 01:09:09,189 the two patterns are very similar-- 1247 01:09:09,189 --> 01:09:13,220 then the dot product is going to be N, right? 1248 01:09:13,220 --> 01:09:16,200 And when you plug that, if that's N, 1249 01:09:16,200 --> 01:09:22,390 then you can see that this becomes xi 1 i, right? 1250 01:09:22,390 --> 01:09:27,850 So what happens is that these other memories that 1251 01:09:27,850 --> 01:09:30,234 are similar to our memorized pattern-- 1252 01:09:33,010 --> 01:09:36,708 then when you sum that, when you compute that sum, 1253 01:09:36,708 --> 01:09:38,649 some of these terms get big enough so 1254 01:09:38,649 --> 01:09:44,560 that the memory in the next step is not that stored memory. 1255 01:09:44,560 --> 01:09:47,479 It's a combination. 1256 01:09:47,479 --> 01:09:48,399 All right? 1257 01:09:48,399 --> 01:09:52,319 So what happens is-- so the way the capacity of the network 1258 01:09:52,319 --> 01:09:53,149 is stored. 1259 01:09:53,149 --> 01:09:57,180 So you can't actually choose all your memories to be orthogonal. 1260 01:09:57,180 --> 01:10:00,720 But a pretty good way of making memory is nearly orthogonal 1261 01:10:00,720 --> 01:10:03,750 is to store them as random [AUDIO OUT].. 1262 01:10:03,750 --> 01:10:08,580 So a lot of the thinking that goes 1263 01:10:08,580 --> 01:10:10,950 into how you would build a network that 1264 01:10:10,950 --> 01:10:14,730 stores a lot of patterns is to take your memories 1265 01:10:14,730 --> 01:10:17,970 and sort of convert them in a way that makes them maximally 1266 01:10:17,970 --> 01:10:20,190 orthogonal to each other. 1267 01:10:20,190 --> 01:10:22,710 You can use things like lateral inhibition 1268 01:10:22,710 --> 01:10:26,800 to orthogonalize different inputs. 1269 01:10:26,800 --> 01:10:31,060 So once you make your patterns sort of noisy, 1270 01:10:31,060 --> 01:10:32,710 then it turns out you can actually 1271 01:10:32,710 --> 01:10:36,070 calculate that if the values of xi 1272 01:10:36,070 --> 01:10:38,200 sort of look like random numbers, 1273 01:10:38,200 --> 01:10:41,380 that you can store up to about 15% 1274 01:10:41,380 --> 01:10:44,930 of the number of neurons worth of memories in your network. 1275 01:10:44,930 --> 01:10:48,500 So if I have 100 neurons in my network, 1276 01:10:48,500 --> 01:10:53,240 I should be able to store about 15 different states 1277 01:10:53,240 --> 01:10:54,830 in that network before they start 1278 01:10:54,830 --> 01:10:59,210 to interfere with each other, before you have a sufficiently 1279 01:10:59,210 --> 01:11:02,360 high probability that two of those memories 1280 01:11:02,360 --> 01:11:04,410 are next to each other. 1281 01:11:04,410 --> 01:11:07,160 And as soon as that happens, then you 1282 01:11:07,160 --> 01:11:10,430 start getting crosstalk between those memories that 1283 01:11:10,430 --> 01:11:12,800 causes the state of the system to evolve 1284 01:11:12,800 --> 01:11:19,580 in a way that doesn't recall one of your stored memories, 1285 01:11:19,580 --> 01:11:20,680 all right? 1286 01:11:20,680 --> 01:11:24,570 And what that looks like in the energy landscape 1287 01:11:24,570 --> 01:11:31,260 is when you build a network with, let's say, five memories, 1288 01:11:31,260 --> 01:11:36,220 there will be five minima in the network that sort of have 1289 01:11:36,220 --> 01:11:41,220 equal low values of energy. 1290 01:11:41,220 --> 01:11:44,570 But when you start sticking too many memories in your network, 1291 01:11:44,570 --> 01:11:47,150 you end up with what are called spurious attractors, sort 1292 01:11:47,150 --> 01:11:54,242 of local minima that aren't at the-- 1293 01:11:54,242 --> 01:11:56,930 that don't correspond to one of the stored memories. 1294 01:11:56,930 --> 01:12:01,280 And so as the system evolves, it can be going downhill 1295 01:12:01,280 --> 01:12:03,590 and get stuck in one of those minima that 1296 01:12:03,590 --> 01:12:09,020 look like a combination of two of the stored memories. 1297 01:12:09,020 --> 01:12:11,568 And that's what went wrong here with the guy with the penguin 1298 01:12:11,568 --> 01:12:12,610 sticking out of his head. 1299 01:12:18,360 --> 01:12:18,940 Who knows? 1300 01:12:18,940 --> 01:12:21,107 Maybe that's what happens when you look at something 1301 01:12:21,107 --> 01:12:24,273 and you're confused about what youre seeing. 1302 01:12:24,273 --> 01:12:26,190 We don't know if that's actually what happens, 1303 01:12:26,190 --> 01:12:29,220 but it would be an interesting thing to test. 1304 01:12:31,770 --> 01:12:33,050 Any questions? 1305 01:12:33,050 --> 01:12:35,780 All right, so that's-- 1306 01:12:35,780 --> 01:12:38,820 so you can see that these are long-term memories. 1307 01:12:38,820 --> 01:12:42,410 These don't depend on activity in the network to store, right? 1308 01:12:42,410 --> 01:12:45,770 Those are programmed into the synaptic connections 1309 01:12:45,770 --> 01:12:47,280 between the neurons. 1310 01:12:47,280 --> 01:12:50,040 So you can shut off all the activity. 1311 01:12:50,040 --> 01:12:54,740 And if you just put in up a pattern of input that 1312 01:12:54,740 --> 01:12:56,540 reminds you of something, the network 1313 01:12:56,540 --> 01:13:00,310 will recover the full memory for you.