1
00:00:16,313 --> 00:00:16,980
MICHALE FEE: OK.
2
00:00:16,980 --> 00:00:18,730
All right, let's go
ahead and get started.
3
00:00:18,730 --> 00:00:21,690
OK, so we're going
to continue talking
4
00:00:21,690 --> 00:00:25,380
about the topic of
neural networks.
5
00:00:25,380 --> 00:00:28,620
Last time, we introduced
a new framework
6
00:00:28,620 --> 00:00:33,330
for thinking about neural
network interactions,
7
00:00:33,330 --> 00:00:37,740
using a rate model to describe
the interactions of neurons
8
00:00:37,740 --> 00:00:41,370
and develop a mathematical
framework for how
9
00:00:41,370 --> 00:00:43,080
to combine
collections of neurons
10
00:00:43,080 --> 00:00:46,300
to study their behavior.
11
00:00:46,300 --> 00:00:50,760
So, last time, we introduced
the notion of a perceptron
12
00:00:50,760 --> 00:00:54,180
as a way of building
a neural network that
13
00:00:54,180 --> 00:00:57,510
can classify its inputs.
14
00:00:57,510 --> 00:01:03,300
And we started talking about the
notion of a perceptron learning
15
00:01:03,300 --> 00:01:06,480
rule, and we're going
to flesh that idea out
16
00:01:06,480 --> 00:01:08,580
in more detail today.
17
00:01:08,580 --> 00:01:12,450
We're going to then talk about
the idea of using networks
18
00:01:12,450 --> 00:01:17,250
to perform logic with neurons.
19
00:01:17,250 --> 00:01:19,800
We're going to talk about the
idea of linear separability
20
00:01:19,800 --> 00:01:21,480
and invariance.
21
00:01:21,480 --> 00:01:24,240
Then we're going to
introduce more complex
22
00:01:24,240 --> 00:01:26,190
feed-forward networks,
where instead
23
00:01:26,190 --> 00:01:28,260
of having a single
output neuron,
24
00:01:28,260 --> 00:01:32,040
we have multiple output neurons.
25
00:01:32,040 --> 00:01:37,800
Then we're going to turn to
a more fully developed view
26
00:01:37,800 --> 00:01:40,890
of the math that we use to
describe neural networks,
27
00:01:40,890 --> 00:01:45,450
and matrix operations
become extremely important
28
00:01:45,450 --> 00:01:50,330
in neural network theory.
29
00:01:50,330 --> 00:01:51,960
And then, finally,
we're going to turn
30
00:01:51,960 --> 00:01:55,110
to some of the kinds
of transformations that
31
00:01:55,110 --> 00:01:58,290
are performed by
matrix multiplication
32
00:01:58,290 --> 00:02:03,080
and by the kinds of-- by
feed-forward neural networks.
33
00:02:03,080 --> 00:02:08,160
OK, so we've been considering
a kind of neural network called
34
00:02:08,160 --> 00:02:12,065
a rate model that uses firing
rates rather than spike trains.
35
00:02:12,065 --> 00:02:13,440
So we introduced
the idea that we
36
00:02:13,440 --> 00:02:16,560
have an output neuron
with firing rate
37
00:02:16,560 --> 00:02:19,830
v that receives input
from an input neuron that
38
00:02:19,830 --> 00:02:21,530
has firing rate u.
39
00:02:21,530 --> 00:02:24,270
The input neuron synapses
onto the output neuron
40
00:02:24,270 --> 00:02:26,490
with a synapse of weight w.
41
00:02:26,490 --> 00:02:29,010
And we described
how we can think
42
00:02:29,010 --> 00:02:34,110
of the input neuron producing a
synaptic input into the output
43
00:02:34,110 --> 00:02:39,600
neuron that has a
magnitude of the firing
44
00:02:39,600 --> 00:02:42,350
rate times the strength of
the synaptic connection.
45
00:02:42,350 --> 00:02:48,550
So the input to the output
neuron here is w times u.
46
00:02:48,550 --> 00:02:53,050
And then we talked about how we
can convert that input current,
47
00:02:53,050 --> 00:02:55,330
let's say, into
our output neuron
48
00:02:55,330 --> 00:02:59,380
into a firing rate of the output
neuron through some function
49
00:02:59,380 --> 00:03:05,050
f, which is what's called the
F-I curve of the neuron that
50
00:03:05,050 --> 00:03:08,920
relates the input to the
firing rate of the neuron.
51
00:03:08,920 --> 00:03:11,260
And we talked about
several different kinds
52
00:03:11,260 --> 00:03:15,850
of F-I firing rate versus input
functions that can be useful.
53
00:03:15,850 --> 00:03:20,950
We then extended our network
from a single input neuron
54
00:03:20,950 --> 00:03:22,960
synapsing onto a
single output neuron
55
00:03:22,960 --> 00:03:26,290
by having multiple
input neurons.
56
00:03:26,290 --> 00:03:29,680
Again, the output neuron
has a firing rate,
57
00:03:29,680 --> 00:03:34,090
and our input neurons have a
vector of firing rates now--
58
00:03:34,090 --> 00:03:37,800
u1, u2, u3, u4, and so on--
59
00:03:37,800 --> 00:03:42,940
that we can combine
together into a vector, u.
60
00:03:42,940 --> 00:03:47,180
Each one of those input neurons
has a synaptic strength w
61
00:03:47,180 --> 00:03:48,470
onto our output neuron.
62
00:03:48,470 --> 00:03:51,580
So we have a vector
of synaptic strengths.
63
00:03:51,580 --> 00:03:56,590
And now we can write down the
input current to our output
64
00:03:56,590 --> 00:04:00,100
neuron as a sum of the
contributions from each
65
00:04:00,100 --> 00:04:07,150
of those input neurons-- so w1,
u1 plus w2, u2, plus w3, u3,
66
00:04:07,150 --> 00:04:08,980
and so on.
67
00:04:08,980 --> 00:04:12,100
So we can now write
the input current
68
00:04:12,100 --> 00:04:16,810
to our output neuron as
a sum of contributions
69
00:04:16,810 --> 00:04:18,850
that we can then write
as a dot product--
70
00:04:18,850 --> 00:04:21,540
w dot u.
71
00:04:21,540 --> 00:04:22,930
OK, any questions about that?
72
00:04:27,480 --> 00:04:30,570
And so, in general, we have
the firing rate of our output
73
00:04:30,570 --> 00:04:32,970
neuron is just
this F-I function,
74
00:04:32,970 --> 00:04:37,500
this input-output function
of our output neuron acting
75
00:04:37,500 --> 00:04:41,023
on the total input,
which is w dot u.
76
00:04:41,023 --> 00:04:42,690
And then we talked
about different kinds
77
00:04:42,690 --> 00:04:46,770
of functions that are
useful computationally
78
00:04:46,770 --> 00:04:47,820
for this function f.
79
00:04:47,820 --> 00:04:51,060
So in the context of the
integrate and fire neuron,
80
00:04:51,060 --> 00:04:56,440
we talked about F-I curves that
are zero below some threshold
81
00:04:56,440 --> 00:05:01,350
and then are linear above
that threshold current.
82
00:05:01,350 --> 00:05:05,640
We talked last time about
a binary threshold known
83
00:05:05,640 --> 00:05:08,140
that has zero firing
rate below some threshold
84
00:05:08,140 --> 00:05:12,390
and then steps up abruptly to
a constant output firing rate
85
00:05:12,390 --> 00:05:14,280
one.
86
00:05:14,280 --> 00:05:16,560
And then we also introduced,
last time, the notion
87
00:05:16,560 --> 00:05:19,050
of a linear neuron,
whose firing rate is
88
00:05:19,050 --> 00:05:21,600
just proportional
to the input current
89
00:05:21,600 --> 00:05:24,300
and has positive and
negative firing rates.
90
00:05:24,300 --> 00:05:26,450
And we talked about the
idea that although it's
91
00:05:26,450 --> 00:05:28,860
biophysically implausible
to have neurons
92
00:05:28,860 --> 00:05:31,650
that have negative
firing rates, that this
93
00:05:31,650 --> 00:05:35,040
is a particularly useful
simplification of neurons.
94
00:05:35,040 --> 00:05:39,990
Because we can just
use linear algebra
95
00:05:39,990 --> 00:05:44,440
to describe the properties of
networks of linear neurons.
96
00:05:44,440 --> 00:05:46,980
And we can do some
really interesting things
97
00:05:46,980 --> 00:05:52,270
with that kind of
mathematical simplification.
98
00:05:52,270 --> 00:05:54,870
We're going to get to
some of that today.
99
00:05:54,870 --> 00:05:57,420
And that allows
you to really build
100
00:05:57,420 --> 00:06:02,750
an intuition for what
neural networks can do.
101
00:06:02,750 --> 00:06:08,570
OK, so let's come back to what
perceptron is and introduce
102
00:06:08,570 --> 00:06:11,820
this perceptron learning role.
103
00:06:11,820 --> 00:06:14,690
So we talked about the idea
that a perceptron carries out
104
00:06:14,690 --> 00:06:17,510
a classification
of its inputs that
105
00:06:17,510 --> 00:06:18,860
represent different features.
106
00:06:18,860 --> 00:06:22,580
So we talked about classifying
animals into dogs and non-dogs
107
00:06:22,580 --> 00:06:27,120
based on two
features of animals.
108
00:06:27,120 --> 00:06:30,110
We talked about
the fact that you
109
00:06:30,110 --> 00:06:34,160
can't make that classification
between dogs and non-dogs
110
00:06:34,160 --> 00:06:36,350
just on the basis of
one of those features,
111
00:06:36,350 --> 00:06:40,580
because these two categories
overlap in this feature
112
00:06:40,580 --> 00:06:42,060
and in this feature.
113
00:06:42,060 --> 00:06:44,960
And so in order to properly
separate those categories,
114
00:06:44,960 --> 00:06:47,780
you need a decision
boundary that's
115
00:06:47,780 --> 00:06:52,280
actually a combination
of those two features.
116
00:06:52,280 --> 00:06:54,290
And we talked about
how you can implement
117
00:06:54,290 --> 00:06:57,790
that using a simple
network, called
118
00:06:57,790 --> 00:07:02,570
a perceptron, that has an output
neuron and two input neurons.
119
00:07:02,570 --> 00:07:06,320
Each one of those input neurons
represents the magnitude
120
00:07:06,320 --> 00:07:10,070
of those two different
features for each object
121
00:07:10,070 --> 00:07:13,220
that you're trying to classify.
122
00:07:13,220 --> 00:07:19,580
So u1 here and u2 are the
dimensions on which we're
123
00:07:19,580 --> 00:07:24,100
performing this classification.
124
00:07:24,100 --> 00:07:28,840
And so we talked about the fact
that that decision boundary
125
00:07:28,840 --> 00:07:31,990
between those two
classifications
126
00:07:31,990 --> 00:07:35,470
is determined by
this weight matrix w.
127
00:07:35,470 --> 00:07:37,810
And then we used a
binary threshold neuron
128
00:07:37,810 --> 00:07:39,700
for making the actual decision.
129
00:07:39,700 --> 00:07:42,370
Binary threshold neurons are
great for making decisions,
130
00:07:42,370 --> 00:07:46,540
because unlike a linear
neuron-- so a linear neuron just
131
00:07:46,540 --> 00:07:48,850
responds more if
its input is larger,
132
00:07:48,850 --> 00:07:51,940
and it responds less if
its input is smaller.
133
00:07:51,940 --> 00:07:57,220
Binary threshold neurons
have a very clear threshold
134
00:07:57,220 --> 00:07:59,380
below which the
neuron doesn't spike
135
00:07:59,380 --> 00:08:01,480
and above which the
neuron does spike.
136
00:08:01,480 --> 00:08:04,300
So, in this case, this network,
this output neuron here,
137
00:08:04,300 --> 00:08:07,420
will fire, will have
a firing rate of one,
138
00:08:07,420 --> 00:08:11,530
for any input that's on this
side of the decision boundary
139
00:08:11,530 --> 00:08:13,510
and will have a
firing rate of zero
140
00:08:13,510 --> 00:08:16,940
for any input that's on this
side of the decision boundary,
141
00:08:16,940 --> 00:08:19,570
OK?
142
00:08:19,570 --> 00:08:24,560
All right, so we talked about
how we can, in two dimensions,
143
00:08:24,560 --> 00:08:28,940
just write down a decision
boundary that will separate,
144
00:08:28,940 --> 00:08:32,870
let's say, green objects
from red objects.
145
00:08:32,870 --> 00:08:36,409
So you can see that
if you sat down
146
00:08:36,409 --> 00:08:39,770
and you looked at this drawing
of green dots and red dots,
147
00:08:39,770 --> 00:08:43,309
that it would be very simple
to just look at that picture
148
00:08:43,309 --> 00:08:46,010
and see that if you put
a decision boundary right
149
00:08:46,010 --> 00:08:49,910
there, that you would be able
to separate the green dots
150
00:08:49,910 --> 00:08:51,350
from the red dots.
151
00:08:51,350 --> 00:08:54,470
How would you actually
calculate the weight vector
152
00:08:54,470 --> 00:08:57,030
that that corresponds
to in a perceptron?
153
00:08:57,030 --> 00:08:59,100
Well, it's very simple.
154
00:08:59,100 --> 00:09:02,300
You can just look at where
that decision boundary crosses
155
00:09:02,300 --> 00:09:04,220
the axes--
156
00:09:04,220 --> 00:09:07,190
so you can see here, that
decision boundary crosses
157
00:09:07,190 --> 00:09:13,080
the u1 axis at point A, crosses
the u2 axis at, I should say,
158
00:09:13,080 --> 00:09:17,840
a value of B. And then we can
use those numbers to actually
159
00:09:17,840 --> 00:09:19,100
calculate the w.
160
00:09:19,100 --> 00:09:21,950
So, remember, u is
the input space.
161
00:09:21,950 --> 00:09:24,230
w is a weight vector
that we're trying
162
00:09:24,230 --> 00:09:27,020
to calculate in order
to place the decision
163
00:09:27,020 --> 00:09:28,070
boundary at that point.
164
00:09:28,070 --> 00:09:32,380
Is that clear what
we're trying to do here?
165
00:09:32,380 --> 00:09:35,220
OK, so we can calculate
that weight vector.
166
00:09:35,220 --> 00:09:37,710
We assume that just
data is some number.
167
00:09:37,710 --> 00:09:39,840
Let's just call it one.
168
00:09:39,840 --> 00:09:44,760
We have an equation for a
line-- w dot u equals theta.
169
00:09:44,760 --> 00:09:47,910
That's the equation for
that decision boundary.
170
00:09:47,910 --> 00:09:52,080
We have two knowns, the two
points on the decision boundary
171
00:09:52,080 --> 00:09:53,960
that we can just
read off by eye.
172
00:09:53,960 --> 00:09:58,020
And we have two unknowns-- the
synaptic weights, w1 and w2.
173
00:09:58,020 --> 00:10:00,510
And so we have two equations--
174
00:10:00,510 --> 00:10:06,020
ua dot w equals theta,
ub dot w equals theta.
175
00:10:06,020 --> 00:10:08,400
And we can just
solve for w1 and w2,
176
00:10:08,400 --> 00:10:10,470
and that's what you got, OK?
177
00:10:10,470 --> 00:10:13,560
So the weight vector that gives
you that decision boundary
178
00:10:13,560 --> 00:10:17,040
is 1 over a and 1 over b, OK?
179
00:10:17,040 --> 00:10:18,480
Those are the two weights.
180
00:10:18,480 --> 00:10:21,700
Any questions about that?
181
00:10:21,700 --> 00:10:23,460
OK.
182
00:10:23,460 --> 00:10:27,630
So in two dimensions, that's
very easy to do, right?
183
00:10:27,630 --> 00:10:31,350
You can just look at
that cloud of points,
184
00:10:31,350 --> 00:10:34,590
decide where to draw a line
that best separates the two
185
00:10:34,590 --> 00:10:37,230
categories that you're
interested in separating.
186
00:10:37,230 --> 00:10:40,870
But in higher dimensions,
that's really hard.
187
00:10:40,870 --> 00:10:44,250
So in high dimensions,
for example,
188
00:10:44,250 --> 00:10:47,720
we're trying to separate
images, for example.
189
00:10:47,720 --> 00:10:49,980
So we can have a bunch
of images of dogs,
190
00:10:49,980 --> 00:10:51,870
a bunch of images of cats.
191
00:10:51,870 --> 00:10:54,030
Each pixel in that
image corresponds
192
00:10:54,030 --> 00:10:56,910
to a different input to
our classification unit.
193
00:10:56,910 --> 00:11:00,960
And now how do you decide
what all of those weights
194
00:11:00,960 --> 00:11:03,180
should be from all of
those different pixels
195
00:11:03,180 --> 00:11:08,760
onto our output neuron that
separates images of one class
196
00:11:08,760 --> 00:11:10,720
from images of another class?
197
00:11:10,720 --> 00:11:14,640
So there's just no way to do
that by eye in high dimensions.
198
00:11:14,640 --> 00:11:17,460
So you need an
algorithm that helps
199
00:11:17,460 --> 00:11:20,130
you choose that set of
weights that allows you
200
00:11:20,130 --> 00:11:22,840
to separate different classes--
201
00:11:22,840 --> 00:11:25,740
you know, a bunch of images
of one class from a bunch
202
00:11:25,740 --> 00:11:28,500
of images of another class.
203
00:11:28,500 --> 00:11:33,540
And so we're going to
introduce a method called
204
00:11:33,540 --> 00:11:40,710
the perceptron learning rule
that is a category of learning
205
00:11:40,710 --> 00:11:47,910
rules called supervised learning
rules that allow you to take
206
00:11:47,910 --> 00:11:51,660
a bunch of objects that
you know-- so if you
207
00:11:51,660 --> 00:11:53,160
have a bunch of
pictures of dogs,
208
00:11:53,160 --> 00:11:54,385
you know that they're dogs.
209
00:11:54,385 --> 00:11:57,010
If you have a bunch of pictures
of cats, you know they're cats.
210
00:11:57,010 --> 00:11:58,920
So you label those images.
211
00:11:58,920 --> 00:12:03,780
You feed those inputs, those
images, into your network,
212
00:12:03,780 --> 00:12:06,870
and you tell the network
what the answer was.
213
00:12:06,870 --> 00:12:09,420
And through an
iterative process,
214
00:12:09,420 --> 00:12:13,410
it finds all of the weights that
optimally separate those two
215
00:12:13,410 --> 00:12:14,740
different categories.
216
00:12:14,740 --> 00:12:16,800
So that's called the
perceptron learning rule.
217
00:12:16,800 --> 00:12:19,240
So let me just set up
how that actually works.
218
00:12:19,240 --> 00:12:22,690
So you have a bunch of
observations of the input.
219
00:12:22,690 --> 00:12:25,960
So in this case, I'm drawing
these in two dimensions,
220
00:12:25,960 --> 00:12:28,560
but you should think about each
one of these dots as being,
221
00:12:28,560 --> 00:12:32,520
let's say, an image of a
dog in very high dimensions,
222
00:12:32,520 --> 00:12:37,920
where instead of just u1 and
u2, you have u1 through u1000,
223
00:12:37,920 --> 00:12:41,280
where each one of those is
the value of a different pixel
224
00:12:41,280 --> 00:12:44,190
in your image.
225
00:12:44,190 --> 00:12:46,170
So you have a bunch of images.
226
00:12:46,170 --> 00:12:50,220
Each one of those corresponds
to an image of a dog.
227
00:12:50,220 --> 00:12:53,610
Each one of those corresponds
to an image of a cat.
228
00:12:53,610 --> 00:12:56,280
And we have a whole bunch
of different observations
229
00:12:56,280 --> 00:12:59,610
or images of those
different categories.
230
00:12:59,610 --> 00:13:00,720
Any questions about that?
231
00:13:03,800 --> 00:13:06,840
All right, so we have n
of those observations.
232
00:13:06,840 --> 00:13:08,880
And for each one of
those observations,
233
00:13:08,880 --> 00:13:12,735
we say that the
input is equal to one
234
00:13:12,735 --> 00:13:15,930
of those observations for one
iteration of this learning
235
00:13:15,930 --> 00:13:17,410
process, OK?
236
00:13:17,410 --> 00:13:19,860
And so with each
observation, we're
237
00:13:19,860 --> 00:13:21,810
told whether this
input corresponds
238
00:13:21,810 --> 00:13:25,740
to one category or another,
so a dog or a non-dog.
239
00:13:25,740 --> 00:13:27,960
And our output, we're asking--
240
00:13:27,960 --> 00:13:30,240
we want to choose
this set of weights
241
00:13:30,240 --> 00:13:32,640
such that the output
of our network
242
00:13:32,640 --> 00:13:37,680
is equal to some known value.
243
00:13:37,680 --> 00:13:43,410
So t sub i, where if it's a dog,
then the answer is one for yes.
244
00:13:43,410 --> 00:13:48,450
If it's a non-dog, the answer
is no for that's not a dog.
245
00:13:48,450 --> 00:13:52,050
And we have n of those answers.
246
00:13:52,050 --> 00:13:56,760
We have n images and labels
that tell us what category
247
00:13:56,760 --> 00:13:59,400
that image belongs to.
248
00:13:59,400 --> 00:14:01,380
So for all of
these, t equals one.
249
00:14:01,380 --> 00:14:03,300
For all of these, t equals zero.
250
00:14:03,300 --> 00:14:05,400
And we want to find
a set of weights
251
00:14:05,400 --> 00:14:10,020
such that when we take the dot
product of that weight factor
252
00:14:10,020 --> 00:14:17,970
into each one of those
observations minus theta
253
00:14:17,970 --> 00:14:23,340
that we get an answer
that is equal to t
254
00:14:23,340 --> 00:14:25,830
for each observation.
255
00:14:25,830 --> 00:14:28,360
Does that make sense?
256
00:14:28,360 --> 00:14:31,240
So how do we do that?
257
00:14:31,240 --> 00:14:37,240
All right, so each observation,
we have two things--
258
00:14:37,240 --> 00:14:41,450
the input and the
desired output.
259
00:14:41,450 --> 00:14:43,150
And that gives us
information that we
260
00:14:43,150 --> 00:14:45,920
can use to construct
this weight vector.
261
00:14:45,920 --> 00:14:48,110
So, again, that's called
supervised learning.
262
00:14:48,110 --> 00:14:52,300
And we're going to use an
update rule, or a learning rule,
263
00:14:52,300 --> 00:14:54,490
that allows us to
change the weight
264
00:14:54,490 --> 00:14:58,180
vector on as a result
of each estimate,
265
00:14:58,180 --> 00:15:01,030
depending on whether we got
the answer right or not.
266
00:15:01,030 --> 00:15:02,370
So how do we do this?
267
00:15:02,370 --> 00:15:03,912
What we're going to
do is we're going
268
00:15:03,912 --> 00:15:08,110
to start with a random set
of weights, w1 and w2, OK?
269
00:15:08,110 --> 00:15:11,580
And we're going to
put in an input.
270
00:15:11,580 --> 00:15:13,255
So there's a space of inputs.
271
00:15:13,255 --> 00:15:15,130
We're going to start
with some random weight,
272
00:15:15,130 --> 00:15:18,230
and I started with some random
vector in this direction.
273
00:15:18,230 --> 00:15:21,920
You can see that that gives you
a classification boundary here.
274
00:15:21,920 --> 00:15:24,340
And you can see that that
classification boundary is not
275
00:15:24,340 --> 00:15:27,290
very good for separating the
green dots from the red dots.
276
00:15:27,290 --> 00:15:27,790
Why?
277
00:15:27,790 --> 00:15:31,060
Because it will assign
a one to everything
278
00:15:31,060 --> 00:15:33,580
on this side of that
decision boundary and a zero
279
00:15:33,580 --> 00:15:35,103
to everything on that side.
280
00:15:35,103 --> 00:15:36,520
But you can see
that that does not
281
00:15:36,520 --> 00:15:39,250
correspond to the
assignment of green and red
282
00:15:39,250 --> 00:15:41,200
to each of those dots, OK?
283
00:15:41,200 --> 00:15:47,523
So how do we update that w in
order to get the right answer?
284
00:15:47,523 --> 00:15:48,940
So what we're going
to do is we're
285
00:15:48,940 --> 00:15:53,710
going to put in one of these
inputs on each iteration
286
00:15:53,710 --> 00:15:57,520
and ask whether the network
got the answer right or not.
287
00:15:57,520 --> 00:16:02,610
So we're going to put
in one of those inputs.
288
00:16:02,610 --> 00:16:05,140
So let's pick that
input right there.
289
00:16:05,140 --> 00:16:07,190
We're going to put
that into our network.
290
00:16:07,190 --> 00:16:09,730
And we see that the answer
we get from the network
291
00:16:09,730 --> 00:16:14,770
is one, because it's on the
positive side of the decision
292
00:16:14,770 --> 00:16:15,560
boundary.
293
00:16:15,560 --> 00:16:19,060
And so one was the right
answer in this case.
294
00:16:19,060 --> 00:16:19,840
So what do we do?
295
00:16:19,840 --> 00:16:20,890
We don't do anything.
296
00:16:20,890 --> 00:16:25,270
We say the change in weight is
going to be zero if we already
297
00:16:25,270 --> 00:16:26,940
get the right answer.
298
00:16:26,940 --> 00:16:29,560
So if we got lucky and
our initial weight vector
299
00:16:29,560 --> 00:16:32,260
was in the right direction,
so our perceptron
300
00:16:32,260 --> 00:16:34,398
already classified
the answer, then
301
00:16:34,398 --> 00:16:36,190
the weight vector is
never going to change,
302
00:16:36,190 --> 00:16:39,400
because it was already
the right answer.
303
00:16:39,400 --> 00:16:41,690
OK, so let's put it
in another input--
304
00:16:41,690 --> 00:16:42,580
a red input.
305
00:16:42,580 --> 00:16:45,970
You can see that the
correct answer is a zero.
306
00:16:45,970 --> 00:16:47,950
The network gave us
a zero, because it's
307
00:16:47,950 --> 00:16:53,380
on the negative side of the
weight vector of the decision
308
00:16:53,380 --> 00:16:54,380
boundary.
309
00:16:54,380 --> 00:16:56,530
And so, again, delta w is zero.
310
00:16:56,530 --> 00:16:58,780
But let's put in
another input now such
311
00:16:58,780 --> 00:17:01,420
that we get the wrong answer.
312
00:17:01,420 --> 00:17:03,580
So let's put in this
input right here.
313
00:17:03,580 --> 00:17:06,339
So you can see that the answer
here, the correct answer
314
00:17:06,339 --> 00:17:12,339
is one, but the network is
going to give us a zero.
315
00:17:12,339 --> 00:17:16,470
So what do we do to
update that weight vector?
316
00:17:16,470 --> 00:17:19,329
So if the output is not
equal to the correct answer,
317
00:17:19,329 --> 00:17:20,150
then we're wrong.
318
00:17:20,150 --> 00:17:22,000
So now we update w.
319
00:17:22,000 --> 00:17:26,140
And the perceptron learning
rule is very simple.
320
00:17:26,140 --> 00:17:30,770
We introduce a change in
w that looks like this.
321
00:17:30,770 --> 00:17:35,620
It's a little change, so
eps eta is a learning rate.
322
00:17:35,620 --> 00:17:39,250
It's generally going
to be smaller than one.
323
00:17:39,250 --> 00:17:43,510
So we're going to put in
a small change in w that's
324
00:17:43,510 --> 00:17:47,440
in the direction of the
input that was wrong
325
00:17:47,440 --> 00:17:51,580
if the correct answer is a one.
326
00:17:51,580 --> 00:17:53,800
We're going to
make a small change
327
00:17:53,800 --> 00:17:57,910
to w in the opposite
direction of that input
328
00:17:57,910 --> 00:18:00,940
if the correct answer was zero.
329
00:18:00,940 --> 00:18:02,120
Does that make sense?
330
00:18:02,120 --> 00:18:06,430
So we're going to
change w in a way that
331
00:18:06,430 --> 00:18:11,930
depends on what the
input was and what
332
00:18:11,930 --> 00:18:13,550
the correct answer was.
333
00:18:16,970 --> 00:18:18,200
So let's walk through this.
334
00:18:18,200 --> 00:18:21,200
So we put it in an input here.
335
00:18:21,200 --> 00:18:25,130
The correct answer is a one,
and we got the answer wrong.
336
00:18:25,130 --> 00:18:28,400
The network gave us a zero, but
the correct answer is a one.
337
00:18:28,400 --> 00:18:31,880
So we're in this region here.
338
00:18:31,880 --> 00:18:35,090
The answer was incorrect,
so we're going to update w.
339
00:18:35,090 --> 00:18:38,300
The correct answer was a one,
so we're going to change delta--
340
00:18:38,300 --> 00:18:42,760
we're going to change w in
the direction of that input.
341
00:18:42,760 --> 00:18:43,760
So that input is there.
342
00:18:43,760 --> 00:18:50,530
So we're going to add a little
bit to w in this direction.
343
00:18:50,530 --> 00:18:53,970
So if we add that little
bit of vector to the w,
344
00:18:53,970 --> 00:18:58,280
it's going to move the w vector
in this direction, right?
345
00:18:58,280 --> 00:18:59,590
So let's do that.
346
00:18:59,590 --> 00:19:02,160
So there's our new w.
347
00:19:02,160 --> 00:19:05,310
Our new w is the
old plus delta w,
348
00:19:05,310 --> 00:19:10,200
which is in the direction
of this incorrectly
349
00:19:10,200 --> 00:19:11,880
classified input.
350
00:19:11,880 --> 00:19:16,470
So there's our new decision
boundary, all right?
351
00:19:16,470 --> 00:19:18,340
And let's put in another input--
352
00:19:18,340 --> 00:19:20,490
let's say this one right here.
353
00:19:20,490 --> 00:19:23,610
You can see that this input is
also incorrectly classified,
354
00:19:23,610 --> 00:19:25,530
because the correct
answer is a zero.
355
00:19:25,530 --> 00:19:28,170
It's a red dot.
356
00:19:28,170 --> 00:19:30,800
But the network since
it's on the positive side
357
00:19:30,800 --> 00:19:32,310
of the decision boundary.
358
00:19:32,310 --> 00:19:34,980
So the network
classifies it as a one.
359
00:19:34,980 --> 00:19:35,480
OK, good.
360
00:19:35,480 --> 00:19:39,050
So the network classified it
as a one and the correct answer
361
00:19:39,050 --> 00:19:40,580
was a zero, so we were wrong.
362
00:19:40,580 --> 00:19:42,650
So we're going to
update w, and we're
363
00:19:42,650 --> 00:19:47,060
going to update it in the
opposite direction of the input
364
00:19:47,060 --> 00:19:49,880
if the correct answer was
zero, which is the case.
365
00:19:49,880 --> 00:19:53,360
So we're going to update w.
366
00:19:53,360 --> 00:19:56,000
And that's the input xi.
367
00:19:56,000 --> 00:19:59,310
Minus xi is in this direction.
368
00:19:59,310 --> 00:20:02,540
So we're going to update
w in that direction.
369
00:20:02,540 --> 00:20:06,530
So we're going to add those
two vectors to get our new w.
370
00:20:06,530 --> 00:20:09,430
And when we do that,
that's what we get.
371
00:20:09,430 --> 00:20:10,730
There's our new w.
372
00:20:10,730 --> 00:20:12,360
There's our new
decision boundary.
373
00:20:12,360 --> 00:20:15,200
And you can see that that
decision boundary is now
374
00:20:15,200 --> 00:20:22,160
perfectly oriented to separate
the red and the green dots.
375
00:20:22,160 --> 00:20:26,060
So that's Rosenblatt's
perceptron learning rule.
376
00:20:26,060 --> 00:20:27,156
Yes, Rebecca?
377
00:20:27,156 --> 00:20:29,100
AUDIENCE: How do you
change the learning rate?
378
00:20:29,100 --> 00:20:30,308
Because what if it's too big?
379
00:20:30,308 --> 00:20:33,067
You'll sort of get not
helpful [INAUDIBLE]..
380
00:20:33,067 --> 00:20:34,400
MICHALE FEE: Yeah, that's right.
381
00:20:34,400 --> 00:20:36,080
So if the learning
rate were too big,
382
00:20:36,080 --> 00:20:38,460
you could see this
first correction.
383
00:20:38,460 --> 00:20:41,930
So let's say that we corrected
w but made a correction that
384
00:20:41,930 --> 00:20:44,160
was too far in this direction.
385
00:20:44,160 --> 00:20:48,350
So now the new w
would point up here.
386
00:20:48,350 --> 00:20:50,640
And that would give us,
again, the wrong answer.
387
00:20:50,640 --> 00:20:53,180
What happens, generally,
is that if your learning
388
00:20:53,180 --> 00:20:59,810
rate is too high, then your
weight vector bounces around.
389
00:20:59,810 --> 00:21:01,790
It oscillates around.
390
00:21:01,790 --> 00:21:04,130
So it'll jump too far
this way, and then
391
00:21:04,130 --> 00:21:06,530
it'll get an error
over here, and it'll
392
00:21:06,530 --> 00:21:07,670
jump too far that way.
393
00:21:07,670 --> 00:21:09,337
And then you'll get
an error over there,
394
00:21:09,337 --> 00:21:11,330
and it'll just keep
bouncing back and forth.
395
00:21:11,330 --> 00:21:13,460
So you generally
choose learning rates
396
00:21:13,460 --> 00:21:16,190
that-- the process of
choosing learning rates
397
00:21:16,190 --> 00:21:18,500
can be a little
tricky Basically,
398
00:21:18,500 --> 00:21:21,920
the answer is start small and
increase it until it breaks.
399
00:21:26,780 --> 00:21:28,210
OK, any questions about that?
400
00:21:31,500 --> 00:21:36,430
So you can see it's a
very simple algorithm that
401
00:21:36,430 --> 00:21:40,750
provides a way of changing w
that is guaranteed to converge
402
00:21:40,750 --> 00:21:45,400
toward the best answer
in separating these two
403
00:21:45,400 --> 00:21:46,360
classes of inputs.
404
00:21:52,270 --> 00:21:55,780
All right, so let's go
a little bit further
405
00:21:55,780 --> 00:21:59,770
into single layer
binary networks
406
00:21:59,770 --> 00:22:02,350
and see what they can do.
407
00:22:02,350 --> 00:22:06,100
So these kinds of networks
are very good for actually
408
00:22:06,100 --> 00:22:08,090
implementing logic operations.
409
00:22:08,090 --> 00:22:10,990
So you can see that-- let's say
that we have a perceptron that
410
00:22:10,990 --> 00:22:12,110
looks like this.
411
00:22:12,110 --> 00:22:17,210
Let's give it a threshold
of 0.5 and give it
412
00:22:17,210 --> 00:22:20,870
a weight vector that's 1 and 1.
413
00:22:20,870 --> 00:22:24,710
So you can see that
this perceptron
414
00:22:24,710 --> 00:22:26,740
gives an answer of zero.
415
00:22:26,740 --> 00:22:29,000
The output neuron
has zero firing rate
416
00:22:29,000 --> 00:22:32,320
for an input that's zero.
417
00:22:32,320 --> 00:22:38,010
But any input that's on the
other side of the decision
418
00:22:38,010 --> 00:22:41,640
boundary produces an
output firing rate of one.
419
00:22:41,640 --> 00:22:50,250
What that means is that if
the input a, or u1, is a 1,
420
00:22:50,250 --> 00:22:54,330
0, then the output
neuron will fire.
421
00:22:54,330 --> 00:22:57,720
If the input is 0, 1, the
output neuron will fire.
422
00:22:57,720 --> 00:23:01,200
And if the input is 1, 1,
the output neuron will fire.
423
00:23:01,200 --> 00:23:07,610
So, basically, any input
above some threshold
424
00:23:07,610 --> 00:23:09,320
will make the
output neuron fire.
425
00:23:09,320 --> 00:23:13,600
So this perceptron
implements an OR gate.
426
00:23:13,600 --> 00:23:18,080
If it's input a or input
b, the output neuron
427
00:23:18,080 --> 00:23:22,330
spikes, as long as those inputs
are above some threshold value.
428
00:23:22,330 --> 00:23:25,280
So that's very much
like a logical OR gate.
429
00:23:28,130 --> 00:23:30,200
Now let's see if we can
implement an AND gate.
430
00:23:30,200 --> 00:23:33,340
So it turns out that
implementing an AND gate
431
00:23:33,340 --> 00:23:35,380
is almost exactly
like an OR gate.
432
00:23:35,380 --> 00:23:40,420
We just need-- what would
we change about this network
433
00:23:40,420 --> 00:23:42,182
to implement an AND gate?
434
00:23:42,182 --> 00:23:43,600
AUDIENCE: A larger [INAUDIBLE].
435
00:23:43,600 --> 00:23:44,642
MICHALE FEE: What's that?
436
00:23:44,642 --> 00:23:45,760
AUDIENCE: A larger theta?
437
00:23:45,760 --> 00:23:47,290
MICHALE FEE: Yeah,
a larger theta.
438
00:23:47,290 --> 00:23:52,670
So all we have to do is
move this line up to here.
439
00:23:52,670 --> 00:23:55,250
And now one of
those inputs is not
440
00:23:55,250 --> 00:23:57,830
enough to make the
output neuron fire.
441
00:23:57,830 --> 00:24:00,620
The other input is not enough
to make the output neuron fire.
442
00:24:00,620 --> 00:24:02,510
Only when you have both.
443
00:24:02,510 --> 00:24:04,520
So that implements an AND gate.
444
00:24:04,520 --> 00:24:09,075
We just increase the
threshold a little bit.
445
00:24:09,075 --> 00:24:09,950
Does that make sense?
446
00:24:09,950 --> 00:24:12,890
So we just increase the
threshold here to 1.5.
447
00:24:12,890 --> 00:24:17,870
And now when either input
is on at a value of one,
448
00:24:17,870 --> 00:24:20,840
that's not enough to make
the output neuron fire.
449
00:24:20,840 --> 00:24:22,670
If this input's on,
it's not enough.
450
00:24:22,670 --> 00:24:25,790
If that output is
on, it's not enough.
451
00:24:25,790 --> 00:24:29,270
Only when both inputs are
on do you get enough input
452
00:24:29,270 --> 00:24:33,010
to this output neuron to make
it have a non-zero firing rate,
453
00:24:33,010 --> 00:24:37,190
to get it above threshold.
454
00:24:37,190 --> 00:24:42,080
Now, there's another very common
logic operation that cannot be
455
00:24:42,080 --> 00:24:47,010
solved by a simple perceptron.
456
00:24:47,010 --> 00:24:51,680
That's called an
exclusive OR, where
457
00:24:51,680 --> 00:24:55,100
this neuron, this
network, we want
458
00:24:55,100 --> 00:25:05,890
it to fire only if input a is on
or input b is on, but not both.
459
00:25:05,890 --> 00:25:08,830
Why is it that that
can't be solved
460
00:25:08,830 --> 00:25:12,010
by the kind of perceptron
that we've been describing?
461
00:25:12,010 --> 00:25:14,830
Anybody have some
intuition about that?
462
00:25:20,022 --> 00:25:23,240
AUDIENCE: I mean, it's
obviously [INAUDIBLE] separable.
463
00:25:23,240 --> 00:25:24,680
MICHALE FEE: Yeah, that's right.
464
00:25:24,680 --> 00:25:27,320
The keyword there is separable.
465
00:25:27,320 --> 00:25:33,210
If you look at this set of
dots, there's no single line,
466
00:25:33,210 --> 00:25:38,060
there's no single boundary
that separates all the red dots
467
00:25:38,060 --> 00:25:40,940
from off the green dots, OK?
468
00:25:40,940 --> 00:25:44,380
And so that set of inputs
is called non-separable.
469
00:25:44,380 --> 00:25:52,700
And sets of inputs that are not
separable cannot be classified
470
00:25:52,700 --> 00:25:58,160
correctly by a simple perceptron
of the type we've been talking
471
00:25:58,160 --> 00:25:59,340
about.
472
00:25:59,340 --> 00:26:00,840
So how do you
solve that problem?
473
00:26:00,840 --> 00:26:06,132
So this is a set of inputs
that's non-separable.
474
00:26:06,132 --> 00:26:08,090
You can see that you can
solve this problem now
475
00:26:08,090 --> 00:26:11,310
if you have two
separate perceptrons.
476
00:26:11,310 --> 00:26:12,420
So watch this.
477
00:26:12,420 --> 00:26:15,410
We can build one
perceptive one that fires,
478
00:26:15,410 --> 00:26:21,590
that has a positive output
when this input is on.
479
00:26:21,590 --> 00:26:24,170
We can have a separate
perceptron that is active
480
00:26:24,170 --> 00:26:29,300
when that input is on.
481
00:26:29,300 --> 00:26:32,270
And then what would we do?
482
00:26:32,270 --> 00:26:34,040
If we had one
neuron that's active
483
00:26:34,040 --> 00:26:35,990
if that input is
on another input
484
00:26:35,990 --> 00:26:37,760
that's active when
that input is on?
485
00:26:40,610 --> 00:26:43,260
We would or them
together, that's right.
486
00:26:43,260 --> 00:26:47,010
So this is what's known as
a multi-layer perceptron.
487
00:26:47,010 --> 00:26:50,040
We have two inputs, one
that represents activity
488
00:26:50,040 --> 00:26:53,980
in a, another that
represents activity in b.
489
00:26:53,980 --> 00:26:57,840
And we have one neuron
in what's called
490
00:26:57,840 --> 00:27:00,840
the intermediate layer
of our perceptron
491
00:27:00,840 --> 00:27:04,930
that has a weight
vector of 1 minus 1.
492
00:27:04,930 --> 00:27:09,270
What that means is this neuron
will be active if input a is
493
00:27:09,270 --> 00:27:14,750
on but not input b.
494
00:27:14,750 --> 00:27:16,880
This one will be active.
495
00:27:16,880 --> 00:27:20,576
This neuron has a different
weight vector-- minus 1, 1.
496
00:27:20,576 --> 00:27:27,770
This neuron will be active if
input b is on but not input a.
497
00:27:30,512 --> 00:27:34,120
And the output neuron
implements an OR operation
498
00:27:34,120 --> 00:27:39,010
that will be active when this
intermediate neuron is on
499
00:27:39,010 --> 00:27:42,820
or that intermediate
neuron is on, OK?
500
00:27:42,820 --> 00:27:47,220
And so that network altogether
implements this exclusive OR
501
00:27:47,220 --> 00:27:48,550
function.
502
00:27:48,550 --> 00:27:50,030
Does that make sense?
503
00:27:50,030 --> 00:27:51,120
Any questions about that?
504
00:27:56,690 --> 00:27:59,030
So this problem
of separability is
505
00:27:59,030 --> 00:28:05,820
extremely important in
classifying inputs in general.
506
00:28:05,820 --> 00:28:11,420
So if you think about
classifying an image,
507
00:28:11,420 --> 00:28:14,840
like a number or
a letter, you can
508
00:28:14,840 --> 00:28:21,430
see that in high-dimensional
space, images
509
00:28:21,430 --> 00:28:28,590
that are all threes,
let's say, are all
510
00:28:28,590 --> 00:28:30,030
very similar to each other.
511
00:28:30,030 --> 00:28:34,000
But they're actually not
separable in this linear space.
512
00:28:34,000 --> 00:28:36,900
And that's because in the
high dimensional space
513
00:28:36,900 --> 00:28:40,920
they exist on what's
called a manifold
514
00:28:40,920 --> 00:28:43,930
in this high-dimensional
space, OK?
515
00:28:43,930 --> 00:28:48,180
They're like all lined
up on some sheet, OK?
516
00:28:48,180 --> 00:28:51,540
So this is an
example of rotations,
517
00:28:51,540 --> 00:28:54,930
and you can see that all these
different threes kind of sit
518
00:28:54,930 --> 00:28:59,160
along a manifold in this
high-dimensional space that
519
00:28:59,160 --> 00:29:01,605
are separate from all
the other numbers.
520
00:29:06,280 --> 00:29:08,310
So all those numbers
exist on what's
521
00:29:08,310 --> 00:29:13,110
called an invariant
transformation, OK?
522
00:29:13,110 --> 00:29:16,600
Now, how would we
separate those images
523
00:29:16,600 --> 00:29:22,060
of threes from all the
other numbers or letters?
524
00:29:22,060 --> 00:29:23,570
How would we do that?
525
00:29:23,570 --> 00:29:30,035
Well, we could imagine building
a multi-layer perceptron that--
526
00:29:30,035 --> 00:29:31,410
so here, I'm
showing that there's
527
00:29:31,410 --> 00:29:35,040
no single line that separates
the threes on this manifold
528
00:29:35,040 --> 00:29:38,130
from all the other
digits over here.
529
00:29:38,130 --> 00:29:40,650
We can solve that
problem by implementing
530
00:29:40,650 --> 00:29:45,090
a multi-layer perceptron that
while one of those perceptrons
531
00:29:45,090 --> 00:29:49,140
detects these objects,
another perceptron detects
532
00:29:49,140 --> 00:29:53,400
these objects, and then we
can OR those all together.
533
00:29:53,400 --> 00:29:58,380
So that's a kind of
network that can now
534
00:29:58,380 --> 00:30:03,990
detect all of these three,
separate them from non-threes.
535
00:30:03,990 --> 00:30:06,240
Does that make sense?
536
00:30:06,240 --> 00:30:10,520
So we can think of objects that
we recognize, like this three
537
00:30:10,520 --> 00:30:12,980
that we recognize, even
though it has different--
538
00:30:12,980 --> 00:30:15,110
we can recognize it
with different rotations
539
00:30:15,110 --> 00:30:20,730
or transformations
or scale changes.
540
00:30:20,730 --> 00:30:23,750
You can also think of the
problem of separating images
541
00:30:23,750 --> 00:30:28,250
from dogs and cats as
also solving this problem,
542
00:30:28,250 --> 00:30:32,450
that the space of
dogs, of dog images,
543
00:30:32,450 --> 00:30:36,680
somehow lives on a manifold
in the high dimensional space
544
00:30:36,680 --> 00:30:39,260
of inputs that we
can distinguish
545
00:30:39,260 --> 00:30:43,070
from the set of
images of cats that's
546
00:30:43,070 --> 00:30:48,570
some other manifold in this
high-dimensional space.
547
00:30:48,570 --> 00:30:53,790
So it turns out that you need
more than just a single layer
548
00:30:53,790 --> 00:30:54,450
perceptron.
549
00:30:54,450 --> 00:30:57,900
You need more than just
a two-layer perceptron.
550
00:30:57,900 --> 00:30:59,820
In general, the
kinds of networks
551
00:30:59,820 --> 00:31:02,790
that are good for separating
different kinds of images,
552
00:31:02,790 --> 00:31:06,240
like dogs and cats and
cars and houses and faces,
553
00:31:06,240 --> 00:31:07,890
look more like this.
554
00:31:07,890 --> 00:31:11,250
So this is work from
Jim DiCarlo's lab,
555
00:31:11,250 --> 00:31:16,770
where they found evidence that
networks in the brain that do
556
00:31:16,770 --> 00:31:18,720
image classification--
for example,
557
00:31:18,720 --> 00:31:21,520
in the visual pathway--
558
00:31:21,520 --> 00:31:25,690
look a lot like very deep
neural networks, where
559
00:31:25,690 --> 00:31:31,420
you have the retina on the
left side here sending inputs
560
00:31:31,420 --> 00:31:33,395
to another letter
in the thalamus,
561
00:31:33,395 --> 00:31:40,300
sending inputs to v1, to v2,
to v4, and so on, up to IT.
562
00:31:40,300 --> 00:31:43,480
And that we can think
of this as being,
563
00:31:43,480 --> 00:31:48,100
essentially, many stacked
layers of perceptrons
564
00:31:48,100 --> 00:31:52,150
that sort of unravel
these manifolds
565
00:31:52,150 --> 00:31:54,550
in this high-dimensional
space to allow
566
00:31:54,550 --> 00:31:59,380
neurons here at the very
end to separate dogs
567
00:31:59,380 --> 00:32:02,065
from cats from
buildings from faces.
568
00:32:04,720 --> 00:32:06,640
And there are
learning rules that
569
00:32:06,640 --> 00:32:09,310
can be used to train
networks like this
570
00:32:09,310 --> 00:32:14,440
by putting in a bunch of
different images of people
571
00:32:14,440 --> 00:32:16,150
and other different
categories that you
572
00:32:16,150 --> 00:32:17,650
might want to separate.
573
00:32:17,650 --> 00:32:19,720
And then each one
of those images
574
00:32:19,720 --> 00:32:23,230
has a label, just like our
perceptron learning rule.
575
00:32:23,230 --> 00:32:27,010
And we can use the image
and the correct label--
576
00:32:27,010 --> 00:32:32,640
face or dog-- and
train that network
577
00:32:32,640 --> 00:32:38,560
by projecting that information
into these intermediate layers
578
00:32:38,560 --> 00:32:41,380
to train that network
to properly classify
579
00:32:41,380 --> 00:32:43,390
those different stimuli, OK?
580
00:32:43,390 --> 00:32:47,770
This is, basically,
the kind of technology
581
00:32:47,770 --> 00:32:51,830
that's currently
being used to train--
582
00:32:51,830 --> 00:32:53,470
this is being used in AI.
583
00:32:53,470 --> 00:32:57,880
It's being used to
train driverless cars.
584
00:32:57,880 --> 00:33:02,350
All kinds of
technological advances
585
00:33:02,350 --> 00:33:06,018
are based on this kind
of technology here.
586
00:33:06,018 --> 00:33:07,060
Any questions about that?
587
00:33:07,060 --> 00:33:08,054
Aditi?
588
00:33:08,054 --> 00:33:10,540
AUDIENCE: So in
actual neurons, I
589
00:33:10,540 --> 00:33:12,550
assume it's not linear, right?
590
00:33:12,550 --> 00:33:14,230
MICHALE FEE: Yes.
591
00:33:14,230 --> 00:33:17,560
These are all nonlinear neurons.
592
00:33:17,560 --> 00:33:19,960
They're more like these
binary threshold units
593
00:33:19,960 --> 00:33:21,628
than they are like
linear neurons.
594
00:33:21,628 --> 00:33:22,170
That's right.
595
00:33:22,170 --> 00:33:25,795
AUDIENCE: But then do
you there's, like--
596
00:33:25,795 --> 00:33:28,372
because right now, I
imagine that models we make
597
00:33:28,372 --> 00:33:30,482
have to have way more
perceptron units.
598
00:33:30,482 --> 00:33:31,190
MICHALE FEE: Yes.
599
00:33:31,190 --> 00:33:34,475
AUDIENCE: We use our
simplified [INAUDIBLE]..
600
00:33:34,475 --> 00:33:35,850
But then our brain
is sometimes--
601
00:33:35,850 --> 00:33:38,610
I mean, it's at, like,
a much faster level,
602
00:33:38,610 --> 00:33:41,090
like way faster, right?
603
00:33:41,090 --> 00:33:46,000
So you think it'd be like--
if we examine what functions
604
00:33:46,000 --> 00:33:50,320
neurons might be using, in a
way that would let us reduce
605
00:33:50,320 --> 00:33:51,760
the number of units needed?
606
00:33:51,760 --> 00:33:53,584
Because right now, for
example, [INAUDIBLE]
607
00:33:53,584 --> 00:33:55,380
be a bunch of lines.
608
00:33:55,380 --> 00:33:58,690
But maybe in the brain, there's
some other function it's using,
609
00:33:58,690 --> 00:34:00,340
which is smoother.
610
00:34:00,340 --> 00:34:02,580
MICHALE FEE: Yeah.
611
00:34:02,580 --> 00:34:04,330
OK, so let me just
make sure I understand.
612
00:34:04,330 --> 00:34:07,540
You're not talking about the
F-I curve of the neurons?
613
00:34:07,540 --> 00:34:09,540
Is that correct?
614
00:34:09,540 --> 00:34:12,100
You're talking about the
way that you figure out
615
00:34:12,100 --> 00:34:13,514
these weights.
616
00:34:13,514 --> 00:34:14,889
Is that what you're
asking about?
617
00:34:14,889 --> 00:34:15,880
AUDIENCE: No.
618
00:34:15,880 --> 00:34:20,034
I'm asking if we use a
more accurate F-I curve,
619
00:34:20,034 --> 00:34:21,657
we'll need less units.
620
00:34:21,657 --> 00:34:23,449
MICHALE FEE: OK, so
that's a good question.
621
00:34:23,449 --> 00:34:26,230
I don't actually know the
answer to the question
622
00:34:26,230 --> 00:34:29,350
of how the specific
choice of F-I curve
623
00:34:29,350 --> 00:34:31,659
affects the performance of this.
624
00:34:31,659 --> 00:34:35,380
The big problem that people
are trying to figure out
625
00:34:35,380 --> 00:34:39,489
in terms of how
these are trained
626
00:34:39,489 --> 00:34:42,250
is the challenge that in
order to train these networks,
627
00:34:42,250 --> 00:34:47,420
you actually need thousands
and thousands, maybe millions,
628
00:34:47,420 --> 00:34:54,139
of examples of different objects
here and the answer here.
629
00:34:54,139 --> 00:34:56,510
So you have to put
in many thousands
630
00:34:56,510 --> 00:35:00,620
of example images and
the answer in order
631
00:35:00,620 --> 00:35:02,540
to train these networks.
632
00:35:02,540 --> 00:35:06,080
And that's not the way
people actually learn.
633
00:35:06,080 --> 00:35:09,530
We don't walk around the
world when we're one-year-old
634
00:35:09,530 --> 00:35:12,550
and our mother saying,
dog, cat, person, house.
635
00:35:12,550 --> 00:35:16,130
You know, it would be... in
order to give a person as many
636
00:35:16,130 --> 00:35:19,070
labeled examples as you
need to give these networks,
637
00:35:19,070 --> 00:35:23,270
you would just be doing nothing,
but your parents would be
638
00:35:23,270 --> 00:35:27,770
pointing things out to you and
telling you one-word answers
639
00:35:27,770 --> 00:35:28,970
of what those are.
640
00:35:28,970 --> 00:35:32,300
Instead, what happens is
we just observe the world
641
00:35:32,300 --> 00:35:34,970
and figure out
kind of categories
642
00:35:34,970 --> 00:35:38,030
based on other sorts of learning
rules that are unsupervised.
643
00:35:38,030 --> 00:35:40,610
We figure out, oh, that's a kind
of thing, and then mom says,
644
00:35:40,610 --> 00:35:42,140
that's a dog.
645
00:35:42,140 --> 00:35:45,110
And then we know that
that category is a dog.
646
00:35:45,110 --> 00:35:47,510
And we sometimes
make mistakes, right?
647
00:35:47,510 --> 00:35:52,820
Like a kid might look
at a bear and say, dog.
648
00:35:52,820 --> 00:35:55,840
And then dad says, no,
no, that's not a dog, son.
649
00:35:59,930 --> 00:36:04,610
So the learning by which
people train their networks
650
00:36:04,610 --> 00:36:06,560
to do classification
of inputs is
651
00:36:06,560 --> 00:36:10,020
quite different from the way
these deep neural networks
652
00:36:10,020 --> 00:36:10,520
work.
653
00:36:10,520 --> 00:36:15,340
And that's a very important
and active area of research.
654
00:36:15,340 --> 00:36:15,840
Yes?
655
00:36:15,840 --> 00:36:19,330
AUDIENCE: Is the fact that
[INAUDIBLE] use unsupervised
656
00:36:19,330 --> 00:36:22,690
learning, as well,
to train a computer
657
00:36:22,690 --> 00:36:25,970
to recognize an image
of a turtle as a gun,
658
00:36:25,970 --> 00:36:28,040
but humans can't do
that [INAUDIBLE]..
659
00:36:28,040 --> 00:36:29,737
MICHALE FEE: Recognize
a turtle if what?
660
00:36:29,737 --> 00:36:32,112
AUDIENCE: Like I saw this
thing where it was like at MIT,
661
00:36:32,112 --> 00:36:33,910
they used an AI.
662
00:36:33,910 --> 00:36:35,810
They manipulated
pixels in images
663
00:36:35,810 --> 00:36:38,128
and convinced the computer
that it was something
664
00:36:38,128 --> 00:36:39,170
that it was not actually.
665
00:36:39,170 --> 00:36:40,160
MICHALE FEE: I see.
666
00:36:40,160 --> 00:36:40,430
Yeah.
667
00:36:40,430 --> 00:36:41,885
AUDIENCE: So like you would
see a picture of a turtle,
668
00:36:41,885 --> 00:36:43,510
but the computer
would get that picture
669
00:36:43,510 --> 00:36:45,200
and say it was,
like, a machine gun.
670
00:36:45,200 --> 00:36:47,660
MICHALE FEE: Just by
manipulating a few pixels
671
00:36:47,660 --> 00:36:49,397
and kind of screwing
with its mind.
672
00:36:49,397 --> 00:36:49,980
AUDIENCE: Yes.
673
00:36:49,980 --> 00:36:50,990
So it's [INAUDIBLE].
674
00:36:54,350 --> 00:36:55,160
MICHALE FEE: Yeah.
675
00:36:55,160 --> 00:36:57,722
Well, people can be tricked
by different things.
676
00:37:01,700 --> 00:37:05,490
The answer is, yes,
it's related to that.
677
00:37:05,490 --> 00:37:08,090
The problem is after
you do this training,
678
00:37:08,090 --> 00:37:09,890
we actually don't
really understand
679
00:37:09,890 --> 00:37:14,090
what's going on in the
guts of this network.
680
00:37:14,090 --> 00:37:16,640
It's very hard to look at
the inside of this network
681
00:37:16,640 --> 00:37:22,090
after it's trained and
understand what it's doing.
682
00:37:22,090 --> 00:37:25,180
And so we don't
know the answer why
683
00:37:25,180 --> 00:37:28,570
it is that you can fool
one of these networks
684
00:37:28,570 --> 00:37:30,550
by changing a few pixels.
685
00:37:30,550 --> 00:37:33,385
Something goes wrong in here,
and we don't know what it is.
686
00:37:33,385 --> 00:37:35,920
It may very well have to do
with the way it's trained,
687
00:37:35,920 --> 00:37:41,830
rather than building categories
in an unsupervised way, which
688
00:37:41,830 --> 00:37:43,940
could be much more
generalizable.
689
00:37:43,940 --> 00:37:46,048
So good question.
690
00:37:46,048 --> 00:37:47,340
I don't really know the answer.
691
00:37:50,330 --> 00:37:50,830
Yes?
692
00:37:50,830 --> 00:37:52,372
AUDIENCE: Sorry,
can you explain what
693
00:37:52,372 --> 00:37:56,280
you mean [INAUDIBLE] the
neural network needs an answer?
694
00:37:56,280 --> 00:38:00,310
They're not categorized and
then tell the user dogs?
695
00:38:00,310 --> 00:38:02,420
MICHALE FEE: Yeah,
so no, in order
696
00:38:02,420 --> 00:38:05,390
to train one of these networks,
you have to give it a data set,
697
00:38:05,390 --> 00:38:07,640
a labeled data set.
698
00:38:07,640 --> 00:38:11,270
So a set of images that
already has the answer
699
00:38:11,270 --> 00:38:15,252
that was labeled by a person.
700
00:38:15,252 --> 00:38:16,710
AUDIENCE: So you
can't just give it
701
00:38:16,710 --> 00:38:19,046
a set of photos of
puppies and snakes
702
00:38:19,046 --> 00:38:21,320
and it'll categorize
them into two groups?
703
00:38:21,320 --> 00:38:23,195
MICHALE FEE: No, nobody
knows how to do that.
704
00:38:25,890 --> 00:38:31,220
People are working on that,
but it's not known yet.
705
00:38:31,220 --> 00:38:32,010
Yes, Jasmine?
706
00:38:34,640 --> 00:38:41,080
AUDIENCE: [INAUDIBLE]
but I see [INAUDIBLE] I
707
00:38:41,080 --> 00:38:44,310
can't separate them and like
adding an additional feature
708
00:38:44,310 --> 00:38:47,874
to raise it to a higher
dimensional space, where
709
00:38:47,874 --> 00:38:50,203
it's separable?
710
00:38:50,203 --> 00:38:52,120
MICHALE FEE: Sorry, I
didn't quite understand.
711
00:38:52,120 --> 00:38:53,806
Can you say it again?
712
00:38:53,806 --> 00:38:56,221
AUDIENCE: I think I
remember reading somewhere
713
00:38:56,221 --> 00:39:02,182
about how when the scenes
are nonlinearly separable--
714
00:39:02,182 --> 00:39:02,890
MICHALE FEE: Yes.
715
00:39:02,890 --> 00:39:05,720
AUDIENCE: --you can add in
another feature to [INAUDIBLE]..
716
00:39:05,720 --> 00:39:06,720
MICHALE FEE: Yeah, yeah.
717
00:39:06,720 --> 00:39:09,090
So let me show you
an example of that.
718
00:39:09,090 --> 00:39:11,850
So coming back to
the exclusive OR.
719
00:39:11,850 --> 00:39:14,130
So one thing that
you can do, you
720
00:39:14,130 --> 00:39:18,570
can see that the reason this is
linearly inseparable-- it's not
721
00:39:18,570 --> 00:39:20,970
linearly separable-- is
because all these points are
722
00:39:20,970 --> 00:39:23,040
in a plane.
723
00:39:23,040 --> 00:39:26,620
So there's no line
that separates them.
724
00:39:26,620 --> 00:39:29,250
But one way, one sort
of trick you can do,
725
00:39:29,250 --> 00:39:30,980
is to add noise to this.
726
00:39:30,980 --> 00:39:33,930
So that now, some of
these points move.
727
00:39:33,930 --> 00:39:36,040
You can add another dimension.
728
00:39:36,040 --> 00:39:38,440
So now let's say
that we add noise,
729
00:39:38,440 --> 00:39:41,790
and we just, by chance, happen
to move the green dots this way
730
00:39:41,790 --> 00:39:44,610
and the red dots,
well, that way.
731
00:39:44,610 --> 00:39:47,400
And now there's a plane that
will separate the red dots
732
00:39:47,400 --> 00:39:49,260
from the green dots.
733
00:39:49,260 --> 00:39:55,170
So that's advanced
beyond the scope of what
734
00:39:55,170 --> 00:39:56,320
we're talking about here.
735
00:39:56,320 --> 00:39:57,870
But yes, there are
tricks that you
736
00:39:57,870 --> 00:40:02,070
can play to get around
this exclusive OR
737
00:40:02,070 --> 00:40:06,570
problem, this linear
separability problem, OK?
738
00:40:06,570 --> 00:40:08,940
All right, great question.
739
00:40:08,940 --> 00:40:12,660
All right, let's push on.
740
00:40:12,660 --> 00:40:18,000
So let's talk about
more general two-layer
741
00:40:18,000 --> 00:40:20,730
feed-forward networks.
742
00:40:20,730 --> 00:40:25,800
So this is referred to as a
two-layer network-- an input
743
00:40:25,800 --> 00:40:28,240
layer and an output layer.
744
00:40:28,240 --> 00:40:31,070
And in this case, we had
a single input neuron
745
00:40:31,070 --> 00:40:32,690
and a single output neuron.
746
00:40:32,690 --> 00:40:36,780
We generalized that to having
multiple input neurons and one
747
00:40:36,780 --> 00:40:37,470
output neuron.
748
00:40:37,470 --> 00:40:39,450
We saw that we can write
down the input current
749
00:40:39,450 --> 00:40:43,500
to this output neuron as
w, the vector of weights,
750
00:40:43,500 --> 00:40:46,080
dotted into the vector
of input firing rates
751
00:40:46,080 --> 00:40:49,310
to give us an expression for
the firing rate of the output
752
00:40:49,310 --> 00:40:50,310
neuron.
753
00:40:50,310 --> 00:40:52,080
And now we can
generalize that further
754
00:40:52,080 --> 00:40:54,520
to the case of multiple
output neurons.
755
00:40:54,520 --> 00:40:57,420
So we have multiple input
neurons, multiple output
756
00:40:57,420 --> 00:40:59,040
neurons.
757
00:40:59,040 --> 00:41:00,510
You can see that
we have a vector
758
00:41:00,510 --> 00:41:02,910
of firing rates of
the input neurons
759
00:41:02,910 --> 00:41:07,100
and a vector of firing
rates of the output neurons.
760
00:41:07,100 --> 00:41:10,043
So we used to just have one
of these output neurons,
761
00:41:10,043 --> 00:41:11,710
and now we've got a
whole bunch of them.
762
00:41:11,710 --> 00:41:14,520
And so we have to write
down a vector of fire rates
763
00:41:14,520 --> 00:41:16,210
in the output layer.
764
00:41:16,210 --> 00:41:19,560
And now we can write down
the firing rate of our output
765
00:41:19,560 --> 00:41:20,590
neurons as follows.
766
00:41:20,590 --> 00:41:22,410
So the firing rate
of this neuron
767
00:41:22,410 --> 00:41:28,170
here is going to be a
dot product of the vector
768
00:41:28,170 --> 00:41:31,110
of weights onto it.
769
00:41:31,110 --> 00:41:33,060
So the firing rate
of output neuron one
770
00:41:33,060 --> 00:41:39,180
is the vector of weights onto
that first output neuron dotted
771
00:41:39,180 --> 00:41:43,200
into the vector of
input firing rates.
772
00:41:43,200 --> 00:41:46,380
And the same for the
next output neuron.
773
00:41:46,380 --> 00:41:47,940
The firing rate of
output neuron two
774
00:41:47,940 --> 00:41:52,350
is dot product of the weights
onto that output neuron two
775
00:41:52,350 --> 00:41:56,040
and onto the vector
of input firing rates.
776
00:41:56,040 --> 00:41:57,900
Same for neuron three.
777
00:41:57,900 --> 00:42:00,500
And we can write
that down as follows.
778
00:42:00,500 --> 00:42:03,600
So the eighth output--
the firing rate
779
00:42:03,600 --> 00:42:06,150
of the eighth output
neuron is the weight vector
780
00:42:06,150 --> 00:42:09,390
onto the eighth output neuron
dotted into the input firing
781
00:42:09,390 --> 00:42:10,530
rate vector, OK?
782
00:42:10,530 --> 00:42:12,690
And we can write
that down as follows,
783
00:42:12,690 --> 00:42:15,810
where we've now introduced
a new thing here,
784
00:42:15,810 --> 00:42:20,780
which is a matrix of weights.
785
00:42:20,780 --> 00:42:23,300
So it's called
the weight matrix.
786
00:42:23,300 --> 00:42:26,600
And it essentially
is a matrix of all
787
00:42:26,600 --> 00:42:32,900
of these synaptic weights, from
the input layer onto the output
788
00:42:32,900 --> 00:42:33,540
layer.
789
00:42:33,540 --> 00:42:36,830
And now if we had
a linear neuron,
790
00:42:36,830 --> 00:42:40,900
we can write down the firing
rate of the output neuron.
791
00:42:40,900 --> 00:42:45,560
The firing rate vector
of output neuron
792
00:42:45,560 --> 00:42:52,610
is just this weight matrix times
the vector of input fire rates.
793
00:42:52,610 --> 00:42:56,240
So now, we've
rewritten this problem
794
00:42:56,240 --> 00:42:59,870
of finding the vector
of output firing rates
795
00:42:59,870 --> 00:43:02,650
as a matrix multiplication.
796
00:43:02,650 --> 00:43:05,490
And we're going to spend
some time talking about what
797
00:43:05,490 --> 00:43:09,030
that means and what that does.
798
00:43:09,030 --> 00:43:12,590
So our feed-forward
network implements a matrix
799
00:43:12,590 --> 00:43:13,970
multiplication.
800
00:43:13,970 --> 00:43:16,790
All right, so let's take
a closer look at what
801
00:43:16,790 --> 00:43:20,780
this weight matrix looks like.
802
00:43:20,780 --> 00:43:26,340
So we have a weight matrix w sub
a comma b that looks like this.
803
00:43:26,340 --> 00:43:29,360
So we have four input neurons
and four output neurons.
804
00:43:29,360 --> 00:43:34,670
We have a weight for each input
neuron onto each output neuron.
805
00:43:34,670 --> 00:43:40,280
The columns here correspond
to different input neurons.
806
00:43:40,280 --> 00:43:42,900
The rows correspond to
different output neurons.
807
00:43:42,900 --> 00:43:46,550
Remember, for a
matrix, the elements
808
00:43:46,550 --> 00:43:54,713
are listed as w sub a, b,
where a is the output neuron.
809
00:43:54,713 --> 00:43:55,630
b is the input neuron.
810
00:43:55,630 --> 00:44:01,760
On so it's w postsynaptic,
presynaptic-- post, pre.
811
00:44:01,760 --> 00:44:04,010
Rows, columns.
812
00:44:04,010 --> 00:44:07,400
So the rows are the
different output neurons.
813
00:44:07,400 --> 00:44:09,485
The columns are the
different input neurons.
814
00:44:12,210 --> 00:44:15,980
So it can be a little
tricky to remember.
815
00:44:15,980 --> 00:44:21,030
I just remember that it's rows--
816
00:44:21,030 --> 00:44:23,890
a matrix is labeled
by rows and columns.
817
00:44:23,890 --> 00:44:28,000
And weight matrices are
postsynaptic, presynaptic--
818
00:44:28,000 --> 00:44:28,660
post, pre.
819
00:44:31,370 --> 00:44:35,160
AUDIENCE: [INAUDIBLE]
comment of [INAUDIBLE]??
820
00:44:35,160 --> 00:44:37,410
MICHALE FEE: I think
that's standard.
821
00:44:37,410 --> 00:44:41,050
I'm pretty sure
that's very standard.
822
00:44:41,050 --> 00:44:43,880
If you find any
exceptions let me know.
823
00:44:43,880 --> 00:44:49,710
OK, we can think of
each row of this matrix
824
00:44:49,710 --> 00:44:53,510
as being the vector of weights
onto one output neuron.
825
00:44:56,890 --> 00:45:01,960
That row is a vector of weights
onto that output neuron--
826
00:45:01,960 --> 00:45:05,123
that row, that output neuron;
that row, that output neuron.
827
00:45:05,123 --> 00:45:06,040
Does that makes sense?
828
00:45:09,590 --> 00:45:13,350
All right, so let's flesh out
this matrix multiplication.
829
00:45:13,350 --> 00:45:15,838
The vector of
output firing rates,
830
00:45:15,838 --> 00:45:17,880
we're going to write it
as a column vector, where
831
00:45:17,880 --> 00:45:20,670
the first number is
this firing rate.
832
00:45:20,670 --> 00:45:22,440
That number is that firing rate.
833
00:45:22,440 --> 00:45:25,560
That number represents
that firing rate, OK?
834
00:45:25,560 --> 00:45:27,390
That's equal to
this weight matrix
835
00:45:27,390 --> 00:45:31,850
times the vector of
input firing rates,
836
00:45:31,850 --> 00:45:36,040
again, written as
a column vector.
837
00:45:36,040 --> 00:45:40,320
And in order to calculate the
firing rate of the first output
838
00:45:40,320 --> 00:45:44,610
neuron, we take the dot product
of the first row of the weight
839
00:45:44,610 --> 00:45:53,020
matrix and the column vector
of input firing rates.
840
00:45:53,020 --> 00:45:59,070
And that gives us this
first firing rate, OK?
841
00:45:59,070 --> 00:46:00,630
To get the second
firing rate, we
842
00:46:00,630 --> 00:46:03,870
take the dot product of
the second row of weights
843
00:46:03,870 --> 00:46:06,570
with the vector of firing
rates, and that gives us
844
00:46:06,570 --> 00:46:10,050
this second firing rate.
845
00:46:10,050 --> 00:46:11,310
Any questions about that?
846
00:46:11,310 --> 00:46:16,740
Just a brief reminder of
matrix multiplication.
847
00:46:16,740 --> 00:46:19,281
All right, no questions?
848
00:46:19,281 --> 00:46:26,910
All right, so let's take
a step back and go quickly
849
00:46:26,910 --> 00:46:30,300
through some basic
matrix algebra.
850
00:46:30,300 --> 00:46:32,670
I know most of you have
probably seen this,
851
00:46:32,670 --> 00:46:35,970
but many haven't, so we're
just going to go through it.
852
00:46:35,970 --> 00:46:40,110
All right, so just
as vectors are--
853
00:46:40,110 --> 00:46:42,570
you can think of them as
a collection of numbers
854
00:46:42,570 --> 00:46:44,190
that you write down.
855
00:46:44,190 --> 00:46:47,970
So let's say that you are
making a measurement of two
856
00:46:47,970 --> 00:46:48,850
different things--
857
00:46:48,850 --> 00:46:52,740
let's say temperature
and humidity.
858
00:46:52,740 --> 00:46:55,980
So you can write down a vector
that represents those two
859
00:46:55,980 --> 00:46:57,160
quantities.
860
00:46:57,160 --> 00:47:00,550
So matrices you can think of
as collections of vectors.
861
00:47:00,550 --> 00:47:03,870
So let's say we take
those two measurements
862
00:47:03,870 --> 00:47:05,980
at different times, at
three different times.
863
00:47:05,980 --> 00:47:11,910
So now we have a vector one, a
vector two, and a vector three
864
00:47:11,910 --> 00:47:14,760
that measure those two
quantities at three
865
00:47:14,760 --> 00:47:16,620
different times, all right?
866
00:47:16,620 --> 00:47:19,350
So we can now write all
of those measurements
867
00:47:19,350 --> 00:47:22,860
down as a matrix,
where we collect
868
00:47:22,860 --> 00:47:27,900
each one of those vectors
as a column in our matrix,
869
00:47:27,900 --> 00:47:28,900
like that.
870
00:47:28,900 --> 00:47:32,070
Any questions about that?
871
00:47:32,070 --> 00:47:37,170
And there's a bit of MATLAB
code that calculates this matrix
872
00:47:37,170 --> 00:47:40,180
by writing three
different column vectors
873
00:47:40,180 --> 00:47:42,030
and then concatenating
them into a matrix.
874
00:47:45,130 --> 00:47:47,930
All right, and you can
see that in this matrix,
875
00:47:47,930 --> 00:47:52,070
the columns are just
the original vectors,
876
00:47:52,070 --> 00:47:53,990
and the rows are--
877
00:47:53,990 --> 00:47:56,480
you can think of
those as a time series
878
00:47:56,480 --> 00:47:59,010
of our first measurement,
let's say temperature.
879
00:47:59,010 --> 00:48:01,610
So that's temperature
as a function of time.
880
00:48:01,610 --> 00:48:08,005
This is temperature and
humidity at one time.
881
00:48:08,005 --> 00:48:08,880
Does that make sense?
882
00:48:11,480 --> 00:48:14,180
All right, so, again, we
can write down this matrix.
883
00:48:14,180 --> 00:48:16,370
Remember, this is
the first measurement
884
00:48:16,370 --> 00:48:18,980
at time two, the first
measurement at time three.
885
00:48:18,980 --> 00:48:21,650
We have two rows
and three columns.
886
00:48:21,650 --> 00:48:23,270
We can also write
down what's known
887
00:48:23,270 --> 00:48:27,080
as the transpose of a matrix
that just flips the rows
888
00:48:27,080 --> 00:48:27,660
and columns.
889
00:48:27,660 --> 00:48:30,290
So we can write
transpose, which is
890
00:48:30,290 --> 00:48:33,860
indicated by this
capital super scripted t.
891
00:48:33,860 --> 00:48:36,140
And here, we're just flipping
the rows and columns.
892
00:48:36,140 --> 00:48:41,510
So the first row of this
matrix becomes the first column
893
00:48:41,510 --> 00:48:43,220
of the transposed matrix.
894
00:48:43,220 --> 00:48:47,450
So we have three
rows and two columns.
895
00:48:47,450 --> 00:48:49,140
A symmetric matrix--
896
00:48:49,140 --> 00:48:50,940
I'm just defining
some terms now.
897
00:48:50,940 --> 00:48:54,360
A symmetric matrix
is a matrix where
898
00:48:54,360 --> 00:48:58,650
the off-diagonal elements--
so let me just define,
899
00:48:58,650 --> 00:49:01,800
that's the diagonal,
the matrix diagonal.
900
00:49:01,800 --> 00:49:04,650
And a symmetric matrix
has the property
901
00:49:04,650 --> 00:49:08,130
that the off-diagonal
elements are zero.
902
00:49:08,130 --> 00:49:11,040
And a symmetric matrix
has the property
903
00:49:11,040 --> 00:49:14,970
that the transpose of that
matrix is equal to the matrix,
904
00:49:14,970 --> 00:49:15,600
OK?
905
00:49:15,600 --> 00:49:18,930
That is only
possible, of course,
906
00:49:18,930 --> 00:49:23,017
if the matrix has the same
number of rows and columns,
907
00:49:23,017 --> 00:49:24,600
if it's what's called
a square matrix.
908
00:49:28,990 --> 00:49:31,030
Let me just remind
you, in general
909
00:49:31,030 --> 00:49:33,290
about matrix multiplication.
910
00:49:33,290 --> 00:49:36,820
We can write down the
product of two matrices.
911
00:49:36,820 --> 00:49:40,090
And we do that multiplication
by taking the dot product
912
00:49:40,090 --> 00:49:44,590
of each row in the first
matrix with each column
913
00:49:44,590 --> 00:49:46,000
in the second matrix.
914
00:49:46,000 --> 00:49:49,930
So here's the product of
matrix A and matrix B.
915
00:49:49,930 --> 00:49:52,660
So there's the product.
916
00:49:52,660 --> 00:49:56,020
If this matrix, if
matrix A, is an m by k--
917
00:49:56,020 --> 00:49:59,090
m rows by k columns--
918
00:49:59,090 --> 00:50:05,090
and matrix B has k
rows by n columns,
919
00:50:05,090 --> 00:50:09,020
then the product of
those two matrices
920
00:50:09,020 --> 00:50:14,180
will have m by n
rows and columns.
921
00:50:14,180 --> 00:50:17,000
And you can see that in order
for matrix multiplication
922
00:50:17,000 --> 00:50:23,510
to work, the number of
columns of the first matrix
923
00:50:23,510 --> 00:50:25,970
equal the number of rows
in the second matrix.
924
00:50:25,970 --> 00:50:30,890
You can see that this k has to
be the same for both matrices.
925
00:50:30,890 --> 00:50:34,120
Does that make sense?
926
00:50:34,120 --> 00:50:37,300
So, again, in order to compute
this element right here,
927
00:50:37,300 --> 00:50:40,675
we take the dot product
of the first row of A
928
00:50:40,675 --> 00:50:46,450
and the first column of B.
That's just 1 times 4, is 4.
929
00:50:46,450 --> 00:50:49,370
Plus negative 2
times 7 is minus 14.
930
00:50:49,370 --> 00:50:51,490
Plus 0 times minus 1 is 0.
931
00:50:51,490 --> 00:50:53,800
Add those up and
you get minus 10.
932
00:50:53,800 --> 00:50:55,090
So you get this number.
933
00:50:55,090 --> 00:50:57,040
You multiply this
row dot product
934
00:50:57,040 --> 00:50:58,990
this row with this
column and so on.
935
00:51:02,710 --> 00:51:06,310
Notice, A times B is
not equal to B times A.
936
00:51:06,310 --> 00:51:11,470
In fact, in cases of rectangular
matrices, matrices that aren't
937
00:51:11,470 --> 00:51:15,160
square, you can't
even do this, often do
938
00:51:15,160 --> 00:51:18,760
this, multiplication
in a different order.
939
00:51:18,760 --> 00:51:22,420
Mathematically, it
doesn't make sense.
940
00:51:22,420 --> 00:51:27,100
So let's say that we
have a matrix of vectors,
941
00:51:27,100 --> 00:51:29,020
and we want to take
the dot product
942
00:51:29,020 --> 00:51:35,420
of each one of those vectors
x with some other vector v. So
943
00:51:35,420 --> 00:51:36,720
let's just write that down.
944
00:51:36,720 --> 00:51:40,410
The way to do that is
to say the answer here,
945
00:51:40,410 --> 00:51:44,130
the dot product of each
one of those column vectors
946
00:51:44,130 --> 00:51:46,730
in our matrix with
this other vector
947
00:51:46,730 --> 00:51:49,580
v we do by taking
the transpose of v,
948
00:51:49,580 --> 00:51:53,100
which takes a column vector
and turns it into a row vector.
949
00:51:53,100 --> 00:51:56,660
And we can now multiply
that by our data matrix x
950
00:51:56,660 --> 00:52:01,700
by taking the dot product
of v with that column of x.
951
00:52:01,700 --> 00:52:05,100
And that gives us a matrix.
952
00:52:05,100 --> 00:52:09,750
So this matrix here, that
vector is a one by two matrix.
953
00:52:09,750 --> 00:52:11,450
This is a two by three matrix.
954
00:52:11,450 --> 00:52:16,010
The product of those is
a one by three matrix.
955
00:52:16,010 --> 00:52:18,790
Any questions about that?
956
00:52:18,790 --> 00:52:19,480
OK.
957
00:52:19,480 --> 00:52:21,860
We can do this a different way.
958
00:52:21,860 --> 00:52:25,420
Notice that the result
of this multiplication
959
00:52:25,420 --> 00:52:27,578
here is a row vector, y.
960
00:52:27,578 --> 00:52:28,870
We can do this a different way.
961
00:52:28,870 --> 00:52:30,740
We can take dot product.
962
00:52:30,740 --> 00:52:35,350
We can also compute this
as y equals x transpose v.
963
00:52:35,350 --> 00:52:37,360
So here, we've taken the
transpose of the data
964
00:52:37,360 --> 00:52:40,790
matrix times this
column vector v.
965
00:52:40,790 --> 00:52:43,850
And again, we take the
dot product of this,
966
00:52:43,850 --> 00:52:45,650
this with this,
and that with that.
967
00:52:45,650 --> 00:52:47,860
And now we get a
column vector that
968
00:52:47,860 --> 00:52:50,650
has the same entries
that we had over here.
969
00:52:53,980 --> 00:52:57,440
All right, so I'm just
showing you different ways
970
00:52:57,440 --> 00:53:00,920
that you can manipulate
a vector in a matrix
971
00:53:00,920 --> 00:53:08,120
to compute the dot product
of elements of vectors
972
00:53:08,120 --> 00:53:11,870
within a data matrix
and other vectors
973
00:53:11,870 --> 00:53:13,490
that you're interested in.
974
00:53:16,720 --> 00:53:19,160
All right, identity matrix.
975
00:53:19,160 --> 00:53:21,580
So when you're multiplying
numbers together,
976
00:53:21,580 --> 00:53:24,370
the number one has
the special property
977
00:53:24,370 --> 00:53:27,910
that you can multiply
any real number by one
978
00:53:27,910 --> 00:53:29,320
and get the same number back.
979
00:53:33,930 --> 00:53:39,030
You have the same kind
of element in matrices.
980
00:53:39,030 --> 00:53:42,530
So is there a matrix that when
multiplied by A gives you A?
981
00:53:42,530 --> 00:53:43,530
And the answer is yes.
982
00:53:43,530 --> 00:53:45,640
It's called the identity matrix.
983
00:53:45,640 --> 00:53:49,230
So it's given by the
symbol I, usually.
984
00:53:49,230 --> 00:53:54,540
A times I equals A. What
does that matrix look like?
985
00:53:54,540 --> 00:53:56,950
Again, the identity
matrix looks like this.
986
00:53:56,950 --> 00:54:01,320
It's a square matrix that
has ones along the diagonal
987
00:54:01,320 --> 00:54:02,970
and zero everywhere else.
988
00:54:05,580 --> 00:54:09,180
So you can see here that if
you take an arbitrary vector x,
989
00:54:09,180 --> 00:54:12,900
multiplied by the
identity matrix,
990
00:54:12,900 --> 00:54:18,630
you can see that this product
is x1, x2 dotted into 1,
991
00:54:18,630 --> 00:54:21,030
0, which gives you x1.
992
00:54:21,030 --> 00:54:25,230
x1, x2 dotted into
0, 1, gives you x2.
993
00:54:25,230 --> 00:54:29,560
And so the answer looks
like that, which is just x.
994
00:54:29,560 --> 00:54:32,450
So the identity matrix
times an arbitrary vector x
995
00:54:32,450 --> 00:54:35,420
gives you x back.
996
00:54:35,420 --> 00:54:40,560
Another very useful
application of linear algebra,
997
00:54:40,560 --> 00:54:43,720
linear algebra tools, is to
solve systems of equations.
998
00:54:43,720 --> 00:54:46,240
So let me show you
what that looks like.
999
00:54:46,240 --> 00:54:52,230
So let's say we want to solve
a simple equation, ax equals c.
1000
00:54:52,230 --> 00:54:54,720
So, in this case, how
do you solve for x?
1001
00:54:54,720 --> 00:54:57,600
Well, you're just going to
divide both sides by a, right?
1002
00:54:57,600 --> 00:54:59,640
So if you divide
both sides by a,
1003
00:54:59,640 --> 00:55:04,020
you get that x equals
1 over a times c.
1004
00:55:04,020 --> 00:55:07,980
So it turns out that there
is a matrix equivalent
1005
00:55:07,980 --> 00:55:11,800
of that, that allows you to
solve systems of equations.
1006
00:55:11,800 --> 00:55:14,610
So if you have a
pair of equations--
1007
00:55:14,610 --> 00:55:18,570
x minus 2y equals 3 and
3x plus y equals 5--
1008
00:55:18,570 --> 00:55:21,360
you can write this down
as a matrix equation,
1009
00:55:21,360 --> 00:55:23,910
where you have a
matrix 1, minus 2,
1010
00:55:23,910 --> 00:55:26,960
3, 1, which correspond to
the coefficients of x and y
1011
00:55:26,960 --> 00:55:28,500
in these equations.
1012
00:55:28,500 --> 00:55:36,120
Times a vector xy is equal
to 3, 5, another vector 3, 5.
1013
00:55:36,120 --> 00:55:40,570
So you can write this
down as ax equals c--
1014
00:55:40,570 --> 00:55:42,420
that's kind of nice--
1015
00:55:42,420 --> 00:55:46,440
where this matrix A is
given by these coefficients
1016
00:55:46,440 --> 00:55:49,650
and this vector c is
given by these terms
1017
00:55:49,650 --> 00:55:53,620
on this side of the equation, on
the right side of the equation.
1018
00:55:53,620 --> 00:55:55,990
Now, how do we solve this?
1019
00:55:55,990 --> 00:56:02,510
Well, can we just divide both
sides of that matrix equation,
1020
00:56:02,510 --> 00:56:04,670
that vector equation, by a?
1021
00:56:04,670 --> 00:56:08,450
So division is not really
defined for matrices,
1022
00:56:08,450 --> 00:56:10,460
but we can use another trick.
1023
00:56:10,460 --> 00:56:12,800
We can multiply both
sides of this equation
1024
00:56:12,800 --> 00:56:17,590
by something that
makes the a go away.
1025
00:56:17,590 --> 00:56:22,760
And so that magical thing
is called the inverse of A.
1026
00:56:22,760 --> 00:56:24,890
So we take the
inverse of matrix A,
1027
00:56:24,890 --> 00:56:28,420
denoted by A with this
superscript minus 1.
1028
00:56:28,420 --> 00:56:31,890
And that's the standard notation
for identifying the inverse.
1029
00:56:31,890 --> 00:56:34,220
It has the property
that A inverse times
1030
00:56:34,220 --> 00:56:37,840
A equals the identity matrix.
1031
00:56:37,840 --> 00:56:39,780
So you can sort of
think about this
1032
00:56:39,780 --> 00:56:45,090
as A equals the identity matrix
over A. Anyway, don't really
1033
00:56:45,090 --> 00:56:47,580
think of it like that.
1034
00:56:47,580 --> 00:56:51,270
So to solve this system
of equations ax equals c,
1035
00:56:51,270 --> 00:56:56,420
we multiply both sides
by that A inverse matrix.
1036
00:56:56,420 --> 00:56:58,130
And so that looks like this--
1037
00:56:58,130 --> 00:57:03,240
A inverse A times x
equals A inverse c.
1038
00:57:03,240 --> 00:57:05,790
A inverse A is just what?
1039
00:57:05,790 --> 00:57:10,920
The identity matrix times
x equals A inverse c.
1040
00:57:10,920 --> 00:57:14,100
And we just saw before that
identity matrix times x
1041
00:57:14,100 --> 00:57:15,930
is just x.
1042
00:57:15,930 --> 00:57:18,240
All right, so
there's the solution
1043
00:57:18,240 --> 00:57:24,140
to this system of equations.
1044
00:57:24,140 --> 00:57:25,640
All right, any
questions about that?
1045
00:57:30,220 --> 00:57:33,000
So how do you find the
inverse of a matrix?
1046
00:57:33,000 --> 00:57:34,650
What is this A inverse?
1047
00:57:34,650 --> 00:57:37,900
How do you get it in real life?
1048
00:57:37,900 --> 00:57:40,590
So in real life, what
you usually do is
1049
00:57:40,590 --> 00:57:44,250
you would just use the matrix
inverse function in Matlab.
1050
00:57:44,250 --> 00:57:47,520
Because for any matrices
other than a two-by-two,
1051
00:57:47,520 --> 00:57:50,160
it's really annoying to
get a matrix inverse.
1052
00:57:50,160 --> 00:57:52,800
But for a two-by-two matrix,
it's actually pretty easy.
1053
00:57:52,800 --> 00:57:56,340
You can almost just get the
answer by looking at the matrix
1054
00:57:56,340 --> 00:57:58,110
and writing down the inverse.
1055
00:57:58,110 --> 00:57:59,530
It looks like this.
1056
00:57:59,530 --> 00:58:03,360
The inverse of a two-by-two
square matrix is just given
1057
00:58:03,360 --> 00:58:06,970
by a slight reordering
of the coefficients,
1058
00:58:06,970 --> 00:58:09,600
of the entries of that matrix,
divided by what's called
1059
00:58:09,600 --> 00:58:14,100
the determinant of A. So
what you do is you flip--
1060
00:58:14,100 --> 00:58:18,090
in a two-by-two matrix,
you flip the A and the D,
1061
00:58:18,090 --> 00:58:24,990
and then you multiply the
diagonal elements by minus 1.
1062
00:58:24,990 --> 00:58:26,640
Now, what is this determinant?
1063
00:58:26,640 --> 00:58:33,060
The determinant is given by
a times d minus b times c.
1064
00:58:33,060 --> 00:58:35,530
And you can prove
that that actually
1065
00:58:35,530 --> 00:58:39,940
is the inverse, because if we
take this and multiply it by A,
1066
00:58:39,940 --> 00:58:43,450
what you find when you multiply
that out is that that's just
1067
00:58:43,450 --> 00:58:48,370
equal to the identity matrix.
1068
00:58:48,370 --> 00:58:52,060
So a matrix has an
inverse if and only
1069
00:58:52,060 --> 00:58:55,360
if the determinant
is not equal to zero.
1070
00:58:55,360 --> 00:58:57,220
If the determinant
is equal to zero,
1071
00:58:57,220 --> 00:58:59,260
you can see that
this thing blows up,
1072
00:58:59,260 --> 00:59:02,250
and there's no inverse.
1073
00:59:02,250 --> 00:59:04,510
We're going to spend
a little bit of time
1074
00:59:04,510 --> 00:59:07,630
later talking about what
that means when a matrix has
1075
00:59:07,630 --> 00:59:11,110
an inverse and what the
determinant actually
1076
00:59:11,110 --> 00:59:18,920
corresponds to in a matrix
multiplication context.
1077
00:59:18,920 --> 00:59:20,870
If the determinant
is equal to zero,
1078
00:59:20,870 --> 00:59:24,260
we say that that
matrix is singular.
1079
00:59:24,260 --> 00:59:27,710
And in that case, you can't
actually find an inverse,
1080
00:59:27,710 --> 00:59:32,240
and you can't solve this
equation right here,
1081
00:59:32,240 --> 00:59:33,950
this system of equations.
1082
00:59:38,720 --> 00:59:42,600
All right, so let's actually
go through this example.
1083
00:59:42,600 --> 00:59:45,530
So here's our
equation, ax equals c.
1084
00:59:45,530 --> 00:59:47,780
We're going to use the
same matrix we had before
1085
00:59:47,780 --> 00:59:50,210
and the same c.
1086
00:59:50,210 --> 00:59:52,910
The determinant is
just the product
1087
00:59:52,910 --> 00:59:56,420
of those minus the product of
those, so 1 minus negative 6.
1088
00:59:56,420 --> 00:59:58,550
So the determinant is 7.
1089
00:59:58,550 --> 01:00:01,410
So there is an inverse
of this matrix.
1090
01:00:01,410 --> 01:00:03,810
And we can just write
that down as follows.
1091
01:00:03,810 --> 01:00:05,990
Again, we've flipped those
two and multiplied those
1092
01:00:05,990 --> 01:00:07,850
by minus 1.
1093
01:00:07,850 --> 01:00:13,550
So we can solve for x just by
taking that inverse times c,
1094
01:00:13,550 --> 01:00:15,920
A inverse times c.
1095
01:00:15,920 --> 01:00:17,840
And if you multiply
that out, you
1096
01:00:17,840 --> 01:00:19,418
see that there's the inverse.
1097
01:00:19,418 --> 01:00:20,210
It's just a vector.
1098
01:00:24,680 --> 01:00:26,110
That's it.
1099
01:00:26,110 --> 01:00:31,400
That's how you solve a system
of equations, all right?
1100
01:00:31,400 --> 01:00:33,970
Any questions about that?
1101
01:00:33,970 --> 01:00:43,590
So this process of solving
systems of equations
1102
01:00:43,590 --> 01:00:49,250
and using matrices
and their inverses
1103
01:00:49,250 --> 01:00:53,840
to solve systems of equations
is a very important concept
1104
01:00:53,840 --> 01:00:55,820
that we're going to use
over and over again.
1105
01:00:58,910 --> 01:01:01,040
All right, let's
turn to the topic
1106
01:01:01,040 --> 01:01:03,660
of matrix transformations.
1107
01:01:03,660 --> 01:01:06,710
All right, so you can see
from this problem of solving
1108
01:01:06,710 --> 01:01:12,100
this system of equations that
that matrix A transformed
1109
01:01:12,100 --> 01:01:15,050
a vector x into a vector c, OK?
1110
01:01:15,050 --> 01:01:21,290
So we have this vector x, which
was 3/7 minus 4/7 a vector.
1111
01:01:21,290 --> 01:01:26,940
When we multiplied that by
A, we got another vector, c.
1112
01:01:30,730 --> 01:01:34,960
And the vector A inverse
transforms this vector
1113
01:01:34,960 --> 01:01:38,320
c back into vector x, right?
1114
01:01:38,320 --> 01:01:44,170
So we can take that vector
c, multiply it by A inverse,
1115
01:01:44,170 --> 01:01:46,420
and get back to x.
1116
01:01:46,420 --> 01:01:49,340
Does that make sense?
1117
01:01:49,340 --> 01:01:56,480
So, in general, a
matrix A maps a set
1118
01:01:56,480 --> 01:01:59,630
of vectors in this whole space.
1119
01:01:59,630 --> 01:02:01,730
So if you have a
two-by-two vector,
1120
01:02:01,730 --> 01:02:08,620
it maps a set of vectors
in R2 onto a different set
1121
01:02:08,620 --> 01:02:10,540
of vectors in R2.
1122
01:02:10,540 --> 01:02:12,820
So you can take
any vector here--
1123
01:02:12,820 --> 01:02:16,360
a vector from the
origin into here--
1124
01:02:16,360 --> 01:02:18,460
multiply that vector
by A, and it gives you
1125
01:02:18,460 --> 01:02:20,800
a different vector.
1126
01:02:20,800 --> 01:02:23,220
And if you multiply that
other vector by A inverse,
1127
01:02:23,220 --> 01:02:27,990
you go back to the
original vector.
1128
01:02:27,990 --> 01:02:31,860
So this vector A
implements some kind
1129
01:02:31,860 --> 01:02:36,560
of transformation on this
space of real numbers
1130
01:02:36,560 --> 01:02:42,120
into a different space
of real numbers, OK?
1131
01:02:42,120 --> 01:02:46,120
And you can only do this
inverse if the determinant of A
1132
01:02:46,120 --> 01:02:47,250
is not equal to zero.
1133
01:02:51,060 --> 01:02:55,260
So I just want to show you
what different kinds of matrix
1134
01:02:55,260 --> 01:02:56,560
transformations look like.
1135
01:03:00,980 --> 01:03:04,810
So let's start with the
simplest matrix transformation--
1136
01:03:04,810 --> 01:03:06,260
the identity matrix.
1137
01:03:06,260 --> 01:03:09,130
So if we take a
vector x, multiply it
1138
01:03:09,130 --> 01:03:12,710
by the identity matrix,
you get another vector y,
1139
01:03:12,710 --> 01:03:15,350
which is equal to x.
1140
01:03:15,350 --> 01:03:18,650
So what we're going to do is
we're going to kind of riff off
1141
01:03:18,650 --> 01:03:21,980
of a theme here, and
we're going to take
1142
01:03:21,980 --> 01:03:26,400
slight perturbations
of the identity matrix
1143
01:03:26,400 --> 01:03:30,990
and see what that new matrix
does to a set of input vectors,
1144
01:03:30,990 --> 01:03:31,490
OK?
1145
01:03:31,490 --> 01:03:33,407
So let me show you how
we're going to do that.
1146
01:03:33,407 --> 01:03:37,050
We're going to take it the
identity matrix 1, 0, 0, 1.
1147
01:03:37,050 --> 01:03:39,020
And we're going to add
a little perturbation
1148
01:03:39,020 --> 01:03:40,085
to the diagonal elements.
1149
01:03:43,900 --> 01:03:47,700
And we're going to see what that
does to a set of input vectors.
1150
01:03:47,700 --> 01:03:49,810
So let me show you
what we're doing here.
1151
01:03:49,810 --> 01:03:51,540
We have each one
of these red dots.
1152
01:03:51,540 --> 01:03:58,410
So what I did was I generated
a bunch of random numbers
1153
01:03:58,410 --> 01:03:59,430
in a 2D space.
1154
01:03:59,430 --> 01:04:01,230
So this is a 2D space.
1155
01:04:01,230 --> 01:04:03,330
And I just randomly
selected a bunch
1156
01:04:03,330 --> 01:04:07,320
of numbers, a bunch of
points on that plane.
1157
01:04:07,320 --> 01:04:11,140
And each one of those
is an input vector x.
1158
01:04:11,140 --> 01:04:13,360
And then I multiplied
that vector
1159
01:04:13,360 --> 01:04:18,100
times this slightly
perturbed identity matrix.
1160
01:04:22,030 --> 01:04:24,270
And then I get a bunch
of output vectors y.
1161
01:04:24,270 --> 01:04:26,850
Input vectors x
are the red dots.
1162
01:04:26,850 --> 01:04:31,800
The output vectors y are the
other end of this blue line.
1163
01:04:31,800 --> 01:04:32,860
Does that make sense?
1164
01:04:32,860 --> 01:04:39,600
So for every vector x,
multiplying it by this matrix
1165
01:04:39,600 --> 01:04:43,630
gives me another vector
that's over here.
1166
01:04:43,630 --> 01:04:44,930
Does that make sense?
1167
01:04:44,930 --> 01:04:49,150
So you can see that
what this matrix does
1168
01:04:49,150 --> 01:04:52,600
is it takes this space,
this cloud of points,
1169
01:04:52,600 --> 01:04:56,900
and stretches them
equally in all directions.
1170
01:04:56,900 --> 01:05:00,760
So it takes any vector
and just makes it longer,
1171
01:05:00,760 --> 01:05:02,200
stretches it out.
1172
01:05:02,200 --> 01:05:04,240
No matter which
direction it's pointing,
1173
01:05:04,240 --> 01:05:06,210
it just makes that
vector slightly longer.
1174
01:05:09,510 --> 01:05:11,070
And here's that
little bit of code
1175
01:05:11,070 --> 01:05:17,670
that I used to
generate those vectors.
1176
01:05:17,670 --> 01:05:19,310
OK, so let's take
another example.
1177
01:05:19,310 --> 01:05:21,640
Let's say that we take
the identity matrix
1178
01:05:21,640 --> 01:05:26,020
and we just add a little
perturbation to one element
1179
01:05:26,020 --> 01:05:29,290
of the identity matrix, OK?
1180
01:05:29,290 --> 01:05:30,580
So what does that do?
1181
01:05:30,580 --> 01:05:37,400
It stretches the vectors
out in the x direction,
1182
01:05:37,400 --> 01:05:40,540
but it doesn't do anything
to the y direction.
1183
01:05:40,540 --> 01:05:45,200
So the vector with a
component in the x direction,
1184
01:05:45,200 --> 01:05:51,250
the x component gets increased
by an by a factor 1 plus delta.
1185
01:05:51,250 --> 01:05:55,390
The components of each of these
vectors in the y direction
1186
01:05:55,390 --> 01:05:57,720
don't change, all right?
1187
01:05:57,720 --> 01:05:59,670
So we're going to take
this cloud of points,
1188
01:05:59,670 --> 01:06:02,610
and we're going to stretch
it in the x direction.
1189
01:06:02,610 --> 01:06:05,540
What about this matrix here?
1190
01:06:05,540 --> 01:06:06,843
What's that going to do?
1191
01:06:06,843 --> 01:06:08,510
AUDIENCE: Stretch it
in the y direction.
1192
01:06:08,510 --> 01:06:09,260
MICHALE FEE: Good.
1193
01:06:09,260 --> 01:06:12,346
It's going to stretch it
out in the y direction.
1194
01:06:12,346 --> 01:06:13,392
Good.
1195
01:06:13,392 --> 01:06:14,350
So that's kind of cute.
1196
01:06:19,000 --> 01:06:22,480
And you can see that this
earlier matrix that we looked
1197
01:06:22,480 --> 01:06:27,560
at right here stretches
in the x direction
1198
01:06:27,560 --> 01:06:29,270
and stretches in
the y direction.
1199
01:06:29,270 --> 01:06:32,960
And that's why that
cloud of vectors
1200
01:06:32,960 --> 01:06:35,750
just stretched out
equally in all directions.
1201
01:06:40,340 --> 01:06:42,580
Out this.
1202
01:06:42,580 --> 01:06:44,404
What is that going to do?
1203
01:06:44,404 --> 01:06:46,864
AUDIENCE: It would stretch in
the x direction and compress
1204
01:06:46,864 --> 01:06:47,850
in the y direction
1205
01:06:47,850 --> 01:06:49,410
MICHALE FEE: Right.
1206
01:06:49,410 --> 01:06:52,500
This perturbation here
is making this component,
1207
01:06:52,500 --> 01:06:54,990
the x component larger.
1208
01:06:54,990 --> 01:06:58,860
This perturbation here--
and delta here is small.
1209
01:06:58,860 --> 01:07:00,100
It's less than one.
1210
01:07:00,100 --> 01:07:03,930
Here, it's making the
y component smaller.
1211
01:07:03,930 --> 01:07:06,600
And so what that looks like
is the y component of each one
1212
01:07:06,600 --> 01:07:08,530
of these vectors gets smaller.
1213
01:07:08,530 --> 01:07:10,740
The x component gets larger.
1214
01:07:10,740 --> 01:07:13,830
And so we're squeezing
in one direction
1215
01:07:13,830 --> 01:07:17,740
and stretching in
the other direction.
1216
01:07:17,740 --> 01:07:22,040
Imagine we took
a block of sponge
1217
01:07:22,040 --> 01:07:23,735
and we grabbed it
and stretched it out,
1218
01:07:23,735 --> 01:07:25,235
and it gets skinny
in this direction
1219
01:07:25,235 --> 01:07:28,700
and stretches out
in that direction.
1220
01:07:28,700 --> 01:07:30,050
All right, that's kind of cool.
1221
01:07:32,750 --> 01:07:36,060
What is this going to do?
1222
01:07:36,060 --> 01:07:38,910
Here, I'm not making a
small perturbation of this,
1223
01:07:38,910 --> 01:07:42,880
but I'm flipping the
sign of one of those.
1224
01:07:42,880 --> 01:07:43,870
What happens there?
1225
01:07:43,870 --> 01:07:44,970
What is that going to do?
1226
01:07:48,470 --> 01:07:50,400
AUDIENCE: [INAUDIBLE]
1227
01:07:50,400 --> 01:07:51,330
MICHALE FEE: Good.
1228
01:07:51,330 --> 01:07:54,240
What do we call that?
1229
01:07:54,240 --> 01:07:57,190
There's a term for it.
1230
01:07:57,190 --> 01:08:02,400
What do you-- yeah, it's
called a mirror reflection.
1231
01:08:02,400 --> 01:08:07,340
So every point that's on
this side of the origin
1232
01:08:07,340 --> 01:08:10,370
gets reflected over to
this side of the origin.
1233
01:08:10,370 --> 01:08:12,020
And every point
that's over here--
1234
01:08:12,020 --> 01:08:13,490
sorry, of this axis.
1235
01:08:13,490 --> 01:08:15,980
Every point that's on
this side of the y-axis
1236
01:08:15,980 --> 01:08:19,740
gets reflected
over to this side.
1237
01:08:19,740 --> 01:08:23,410
So that's called a
mirror reflection.
1238
01:08:23,410 --> 01:08:24,518
What is this?
1239
01:08:24,518 --> 01:08:25,560
What is that going to do?
1240
01:08:35,430 --> 01:08:35,930
Abiba?
1241
01:08:35,930 --> 01:08:38,567
AUDIENCE: Reflect
it [INAUDIBLE]..
1242
01:08:38,567 --> 01:08:39,359
MICHALE FEE: Right.
1243
01:08:39,359 --> 01:08:43,399
It's going to reflect it
through the origin, like this.
1244
01:08:43,399 --> 01:08:46,229
So every point that's over
here, on one side of the origin,
1245
01:08:46,229 --> 01:08:50,270
is going to reflect
through to the other side.
1246
01:08:50,270 --> 01:08:52,450
That's pretty neat.
1247
01:08:52,450 --> 01:08:54,660
Inversion of the origin.
1248
01:08:54,660 --> 01:08:56,870
OK?
1249
01:08:56,870 --> 01:08:59,460
So we have symmetric
perturbations
1250
01:08:59,460 --> 01:09:04,300
in the x and y components
of the identity matrix.
1251
01:09:04,300 --> 01:09:10,200
We have a stretch transformation
that stretches along one axis,
1252
01:09:10,200 --> 01:09:12,149
but not the other.
1253
01:09:12,149 --> 01:09:17,130
Stretch around the other axis,
the y-axis, but not the x-axis.
1254
01:09:17,130 --> 01:09:21,120
Stretch along x and
compression along y.
1255
01:09:21,120 --> 01:09:24,990
Mirror reflection
through the y-axis.
1256
01:09:24,990 --> 01:09:27,870
Inversion through the origin.
1257
01:09:27,870 --> 01:09:31,740
These are examples of
diagonal matrices, OK?
1258
01:09:31,740 --> 01:09:34,180
So the only thing
we've done so far--
1259
01:09:34,180 --> 01:09:36,779
we've gotten all these
really cool transformations,
1260
01:09:36,779 --> 01:09:38,970
but the only thing
we've done so far
1261
01:09:38,970 --> 01:09:40,905
are change these two
diagonal elements.
1262
01:09:43,779 --> 01:09:46,510
So there's a lot
more crazy stuff
1263
01:09:46,510 --> 01:09:51,310
to happen if we start messing
with the other components.
1264
01:09:51,310 --> 01:09:55,540
Oh, and I should mention
that we can invert
1265
01:09:55,540 --> 01:10:01,060
any one of these transformations
that we just did by finding
1266
01:10:01,060 --> 01:10:03,020
the inverse of this matrix.
1267
01:10:03,020 --> 01:10:06,805
The inverse of a diagonal matrix
is very simple to calculate.
1268
01:10:06,805 --> 01:10:10,015
It's just one over
those diagonal elements.
1269
01:10:13,470 --> 01:10:14,580
All right, how about this?
1270
01:10:17,868 --> 01:10:18,910
What is that going to do?
1271
01:10:18,910 --> 01:10:19,802
Anybody?
1272
01:10:28,970 --> 01:10:30,980
When you take a vector
and you multiply it
1273
01:10:30,980 --> 01:10:33,290
by that, what's going to happen?
1274
01:10:33,290 --> 01:10:36,800
This part is going to give
you the original vector back.
1275
01:10:36,800 --> 01:10:41,330
This part is going to take a
little bit of the y component
1276
01:10:41,330 --> 01:10:45,630
and add it to the x component.
1277
01:10:45,630 --> 01:10:47,450
So what does that do?
1278
01:10:47,450 --> 01:10:50,340
That produces what's
known as a shear.
1279
01:10:50,340 --> 01:10:53,340
So points up here,
we're going to take
1280
01:10:53,340 --> 01:10:57,700
a little bit of the y component
and add it to the x component.
1281
01:10:57,700 --> 01:11:00,300
So if something has
a big y component,
1282
01:11:00,300 --> 01:11:04,242
it's going to be shifted in x.
1283
01:11:04,242 --> 01:11:06,710
If something has a
negative y component,
1284
01:11:06,710 --> 01:11:08,670
it's going to shift
this way in x.
1285
01:11:08,670 --> 01:11:10,440
If something has a
positive y component,
1286
01:11:10,440 --> 01:11:12,500
it's going to shift
this way an x.
1287
01:11:12,500 --> 01:11:16,050
And it's going to produce
what's called a shear.
1288
01:11:16,050 --> 01:11:20,100
So we're pushing
these points this way,
1289
01:11:20,100 --> 01:11:21,630
pushing those points this way.
1290
01:11:25,230 --> 01:11:29,760
Shear is very important in
things like the flow of liquid.
1291
01:11:29,760 --> 01:11:32,700
So when you have liquid
flowing over a surface,
1292
01:11:32,700 --> 01:11:37,620
you have forces, frictional
forces to the liquid down here
1293
01:11:37,620 --> 01:11:39,750
that prevent it from moving.
1294
01:11:39,750 --> 01:11:42,550
Liquid up here
moves more quickly,
1295
01:11:42,550 --> 01:11:48,250
and it produces a shear in the
pattern of velocity profiles.
1296
01:11:48,250 --> 01:11:50,560
OK, that's pretty cool.
1297
01:11:50,560 --> 01:11:52,150
What about this?
1298
01:11:56,520 --> 01:11:58,750
It's going to just
produce a shear
1299
01:11:58,750 --> 01:12:00,380
along the other direction.
1300
01:12:00,380 --> 01:12:01,300
That's right.
1301
01:12:01,300 --> 01:12:03,250
So now components that have a--
1302
01:12:03,250 --> 01:12:07,960
vectors that have a
large x component acquire
1303
01:12:07,960 --> 01:12:10,900
a negative projection in y.
1304
01:12:17,160 --> 01:12:19,920
OK, what does this look like?
1305
01:12:19,920 --> 01:12:20,800
It's pretty cool.
1306
01:12:30,600 --> 01:12:36,680
We're going to get some
shear in this direction,
1307
01:12:36,680 --> 01:12:39,860
get some shear in
this direction.
1308
01:12:39,860 --> 01:12:42,137
What's it going to do?
1309
01:12:42,137 --> 01:12:46,630
AUDIENCE: [INAUDIBLE]
1310
01:12:46,630 --> 01:12:47,420
MICHALE FEE: Good.
1311
01:12:47,420 --> 01:12:48,950
Good guess.
1312
01:12:48,950 --> 01:12:52,840
That's exactly right,
produces a rotation.
1313
01:12:52,840 --> 01:12:55,290
Not exactly a rotation,
but very close.
1314
01:13:01,470 --> 01:13:04,980
So that's how you actually
produce a rotation.
1315
01:13:04,980 --> 01:13:10,000
So notice, for small angles
theta, these are close to one,
1316
01:13:10,000 --> 01:13:13,140
so it's close to
an identity matrix.
1317
01:13:13,140 --> 01:13:17,090
These are close to zero,
but this is negative
1318
01:13:17,090 --> 01:13:20,970
and this is positive,
or the other way around.
1319
01:13:20,970 --> 01:13:27,560
So if we have diagonals close
to one and the off-diagonals one
1320
01:13:27,560 --> 01:13:31,640
positive and one negative,
then that produces a rotation.
1321
01:13:31,640 --> 01:13:33,760
That, formally, is
a rotation matrix.
1322
01:13:33,760 --> 01:13:34,560
Yes?
1323
01:13:34,560 --> 01:13:36,580
AUDIENCE: On the
previous slide, is there
1324
01:13:36,580 --> 01:13:39,858
a reason you chose to represent
the delta on the x-axis as
1325
01:13:39,858 --> 01:13:40,860
negative?
1326
01:13:40,860 --> 01:13:41,780
MICHALE FEE: No.
1327
01:13:41,780 --> 01:13:42,720
It goes either way.
1328
01:13:42,720 --> 01:13:45,600
So if you have a rotation
angle that's positive,
1329
01:13:45,600 --> 01:13:48,590
then this is negative
and this is positive.
1330
01:13:48,590 --> 01:13:50,840
If your rotation angle
is the other sign,
1331
01:13:50,840 --> 01:13:55,520
then this is positive
and this is negative.
1332
01:13:55,520 --> 01:14:00,260
So, for example, if we want to
produce a 45-degree rotation,
1333
01:14:00,260 --> 01:14:04,820
then we have 1, 1, minus 1, 1.
1334
01:14:04,820 --> 01:14:07,040
And of course, all those
things have a square root
1335
01:14:07,040 --> 01:14:10,003
of 2, 1 over square
root of 2, in them.
1336
01:14:10,003 --> 01:14:11,170
And so that looks like this.
1337
01:14:11,170 --> 01:14:14,180
So if you have, let's say,
theta equals 10 degrees,
1338
01:14:14,180 --> 01:14:17,960
we can produce a 10-degree
rotation of all the vectors.
1339
01:14:17,960 --> 01:14:20,180
If theta is 25
degrees, you can see
1340
01:14:20,180 --> 01:14:23,220
that the rotation is further.
1341
01:14:23,220 --> 01:14:25,560
Theta 45, that's
this case right here.
1342
01:14:25,560 --> 01:14:28,560
You can see that you get a
45-degree rotation of all
1343
01:14:28,560 --> 01:14:31,440
of those vectors
around the origin.
1344
01:14:31,440 --> 01:14:37,850
And if theta is 90 degrees,
you can see that, OK?
1345
01:14:37,850 --> 01:14:38,660
Pretty cool, right?
1346
01:14:42,700 --> 01:14:46,880
OK, what is the inverse
of this rotation matrix?
1347
01:14:46,880 --> 01:14:50,620
So if we have a
rotation-- oh, and I just
1348
01:14:50,620 --> 01:14:53,140
want to point out
one more thing.
1349
01:14:53,140 --> 01:14:55,780
In this formulation of
the rotation matrix,
1350
01:14:55,780 --> 01:15:00,970
positive angles correspond
to rotating counterclockwise.
1351
01:15:03,560 --> 01:15:07,640
Negative angles
correspond to rotation
1352
01:15:07,640 --> 01:15:09,920
in the clockwise direction, OK?
1353
01:15:09,920 --> 01:15:11,660
So there's a big hint.
1354
01:15:11,660 --> 01:15:17,230
What is the inverse of
our rotation matrix?
1355
01:15:17,230 --> 01:15:22,940
If we have a rotation
of 10 degrees this way,
1356
01:15:22,940 --> 01:15:24,960
what is the inverse of that?
1357
01:15:24,960 --> 01:15:26,738
AUDIENCE: [INAUDIBLE]
1358
01:15:26,738 --> 01:15:27,530
MICHALE FEE: Right.
1359
01:15:27,530 --> 01:15:28,910
AUDIENCE: [INAUDIBLE]
1360
01:15:28,910 --> 01:15:30,290
MICHALE FEE: That's right.
1361
01:15:30,290 --> 01:15:35,870
Remember, matrix multiplication
implements a transformation.
1362
01:15:35,870 --> 01:15:38,450
The inverse of
that transformation
1363
01:15:38,450 --> 01:15:41,420
just takes you back
where you were.
1364
01:15:41,420 --> 01:15:44,810
So if you have a rotation
matrix that you implemented
1365
01:15:44,810 --> 01:15:47,750
a 20-degree rotation
in the plus direction,
1366
01:15:47,750 --> 01:15:51,710
then the inverse of that
is a 20-degree rotation
1367
01:15:51,710 --> 01:15:53,120
in the minus direction.
1368
01:15:53,120 --> 01:15:55,000
So the inverse of
this matrix you
1369
01:15:55,000 --> 01:15:58,830
can get just by putting in
a minus sign into the theta.
1370
01:15:58,830 --> 01:16:01,580
And you can see that
cosine of minus theta
1371
01:16:01,580 --> 01:16:03,200
is just cosine of theta.
1372
01:16:03,200 --> 01:16:06,500
But sine of minus theta
is negative sine of theta.
1373
01:16:09,770 --> 01:16:13,100
So the inverse of this
matrix is just this.
1374
01:16:13,100 --> 01:16:15,450
You change the sign
of those diagonals,
1375
01:16:15,450 --> 01:16:19,620
which just makes the shear go in
the opposite direction, right?
1376
01:16:23,400 --> 01:16:26,680
OK, so a rotation
by angle plus theta
1377
01:16:26,680 --> 01:16:29,590
followed by a rotation
of angle minus theta
1378
01:16:29,590 --> 01:16:31,300
puts everything
back where it was.
1379
01:16:31,300 --> 01:16:37,590
So rotation matrix phi of
minus theta times phi of theta
1380
01:16:37,590 --> 01:16:39,370
is equal to the identity matrix.
1381
01:16:39,370 --> 01:16:41,790
So those two are
inverses of each other.
1382
01:16:44,410 --> 01:16:47,860
And the inverse of a--
notice that the inverse
1383
01:16:47,860 --> 01:16:51,850
of this rotation matrix
is also just the transpose
1384
01:16:51,850 --> 01:16:52,930
of the rotation matrix.
1385
01:16:56,550 --> 01:16:58,190
All right, so what
you can see is
1386
01:16:58,190 --> 01:17:03,170
that these different
cool transformations
1387
01:17:03,170 --> 01:17:07,490
that these matrix
multiplications can do
1388
01:17:07,490 --> 01:17:11,870
are just examples of what our
feed-forward network can do.
1389
01:17:11,870 --> 01:17:13,460
Because the feed-m
forward network
1390
01:17:13,460 --> 01:17:16,380
just implements
matrix multiplication.
1391
01:17:16,380 --> 01:17:18,950
So this feed-forward
network takes
1392
01:17:18,950 --> 01:17:21,890
a set of vectors, a
set of input vectors,
1393
01:17:21,890 --> 01:17:26,060
and transforms them into a set
of output vectors, all right?
1394
01:17:26,060 --> 01:17:29,510
And you can understand what
that transformation does just
1395
01:17:29,510 --> 01:17:32,060
by understanding the different
kinds of transformations
1396
01:17:32,060 --> 01:17:37,550
you can get from
matrix multiplication.
1397
01:17:37,550 --> 01:17:40,390
All right, we'll
continue next time.