WEBVTT

00:00:00.060 --> 00:00:02.500
The following content is
provided under a Creative

00:00:02.500 --> 00:00:04.019
Commons license.

00:00:04.019 --> 00:00:06.360
Your support will help
MIT OpenCourseWare

00:00:06.360 --> 00:00:10.730
continue to offer high quality
educational resources for free.

00:00:10.730 --> 00:00:13.340
To make a donation or
view additional materials

00:00:13.340 --> 00:00:17.229
from hundreds of MIT courses,
visit MIT OpenCourseWare

00:00:17.229 --> 00:00:17.854
at ocw.mit.edu.

00:00:21.340 --> 00:00:23.080
PROFESSOR: Today
we are introducing

00:00:23.080 --> 00:00:25.330
an exciting new pledge in 6034.

00:00:25.330 --> 00:00:29.280
Anyone who has already looked at
any of the neural net problems

00:00:29.280 --> 00:00:31.860
will have easily been able to
see that even though Patrick

00:00:31.860 --> 00:00:34.300
only has them back
up to 2006 now,

00:00:34.300 --> 00:00:39.500
there's still-- well out of
four tests, perhaps two or three

00:00:39.500 --> 00:00:42.140
different ways that the
neural nets were drawn.

00:00:42.140 --> 00:00:44.960
Our exciting new pledge is
we're going to draw them

00:00:44.960 --> 00:00:47.110
in a particular way this year.

00:00:47.110 --> 00:00:52.350
And I will show you which
way, assuming that this works.

00:00:52.350 --> 00:00:53.820
Yes.

00:00:53.820 --> 00:00:57.580
We are going to draw them
like the one on the right.

00:00:57.580 --> 00:01:01.050
The one on the left is the
same as the one on the right.

00:01:01.050 --> 00:01:03.460
At first, not having
had to explain

00:01:03.460 --> 00:01:05.209
the difference between
the two of them,

00:01:05.209 --> 00:01:07.310
you might think you want
the one on the left.

00:01:07.310 --> 00:01:09.590
But you really want
the one on the right,

00:01:09.590 --> 00:01:11.590
and I'll explain why.

00:01:11.590 --> 00:01:15.810
The 2007 quiz was drawn,
roughly similarly, to this.

00:01:15.810 --> 00:01:19.860
Although if you somehow wind up
in tutorial or somewhere else

00:01:19.860 --> 00:01:21.930
doing one of the older
quizzes, a lot of them

00:01:21.930 --> 00:01:24.040
were drawn exactly like this.

00:01:24.040 --> 00:01:27.650
In this representation, one
thing I really don't like,

00:01:27.650 --> 00:01:30.900
is that the inputs are
called x's, and the outputs

00:01:30.900 --> 00:01:37.330
are called y's, but there's
two x's, so the inputs are not

00:01:37.330 --> 00:01:40.690
x and y, and then they often
correspond to x's of a graph,

00:01:40.690 --> 00:01:43.720
and then people get confused.

00:01:43.720 --> 00:01:46.980
Additional issues
that many people have

00:01:46.980 --> 00:01:50.880
are the fact that the summation
and the multiplication

00:01:50.880 --> 00:01:52.190
with the weight is implied.

00:01:52.190 --> 00:01:54.780
The weights are written on the
edges, where outputs and inputs

00:01:54.780 --> 00:01:58.390
go, and the summation of
the two inputs into the node

00:01:58.390 --> 00:02:00.540
are also implied.

00:02:00.540 --> 00:02:02.970
But take a look here.

00:02:02.970 --> 00:02:05.560
This is the same net.

00:02:05.560 --> 00:02:17.500
These w's here would be
the w's that are written

00:02:17.500 --> 00:02:20.610
onto these lines are here.

00:02:20.610 --> 00:02:22.450
Actually the better
way to draw it

00:02:22.450 --> 00:02:32.380
would be like so,
since each of these

00:02:32.380 --> 00:02:36.760
can have their own w,
which is different.

00:02:40.730 --> 00:02:44.020
So each of the w's
that are down here,

00:02:44.020 --> 00:02:46.250
are being explicitly
set to a multiplier.

00:02:46.250 --> 00:02:48.470
Where as here, you
just had to remember

00:02:48.470 --> 00:02:51.700
to multiply the weight by
the input that was coming by.

00:02:51.700 --> 00:02:54.260
Here you see an input,
comes to a multiplier,

00:02:54.260 --> 00:02:59.180
you multiply by the weight,
then once you multiplied

00:02:59.180 --> 00:03:02.410
all the inputs by the weight,
then you send them to a sum,

00:03:02.410 --> 00:03:04.870
so the sigma is just a
sum, you sum them, add them

00:03:04.870 --> 00:03:09.910
all together, send the result of
that into the sigmoid function,

00:03:09.910 --> 00:03:14.330
our old buddy, 1 over 1 plus
e to the negative whatever

00:03:14.330 --> 00:03:18.900
our input was, with a
weight for an offset,

00:03:18.900 --> 00:03:20.740
and then we send
the result of that

00:03:20.740 --> 00:03:26.520
into more multipliers with
more weights, more sums, more

00:03:26.520 --> 00:03:28.360
sigmoids.

00:03:28.360 --> 00:03:33.340
So this is how it's going
to look like on the quiz.

00:03:33.340 --> 00:03:41.610
And this is a conversion
guide from version 0.9 data

00:03:41.610 --> 00:03:42.940
into version 1.0.

00:03:42.940 --> 00:03:46.060
So if you see
something that looks

00:03:46.060 --> 00:03:49.610
like this, on one of the old
quizzes that you're doing,

00:03:49.610 --> 00:03:52.980
see if you can convert it,
and then solve the problem.

00:03:52.980 --> 00:03:54.735
Chances are if you
can convert it,

00:03:54.735 --> 00:03:56.110
you're probably
going to do fine.

00:04:01.970 --> 00:04:04.550
We'll start off not only
with this conversion guide,

00:04:04.550 --> 00:04:17.839
but also-- I'll leave
that up here-- also

00:04:17.839 --> 00:04:20.310
I'm going to work out the
formulas for you guys one more

00:04:20.310 --> 00:04:21.230
time.

00:04:21.230 --> 00:04:25.500
These are all the
formulae that you're

00:04:25.500 --> 00:04:27.100
going to need on the quiz.

00:04:27.100 --> 00:04:29.700
And then we're
going to decide what

00:04:29.700 --> 00:04:34.480
will change in the formulae,
if, and this is a very

00:04:34.480 --> 00:04:38.230
likely if, there seems to
be good amount of times

00:04:38.230 --> 00:04:41.820
that this happens, is that
the sigmoid function in those

00:04:41.820 --> 00:04:44.530
neurons out there
was ever changed

00:04:44.530 --> 00:04:46.050
into some other
kind of function.

00:04:46.050 --> 00:04:46.550
Hint.

00:04:46.550 --> 00:04:49.460
It's changed into a plus
already in the problem

00:04:49.460 --> 00:04:50.800
we're about to do.

00:04:50.800 --> 00:04:54.420
People change it all the time
into some bizarro function.

00:04:54.420 --> 00:04:57.680
I've seen arc tangent, I think.

00:04:57.680 --> 00:04:59.910
So here we go.

00:04:59.910 --> 00:05:01.160
Let's look at the front of it.

00:05:01.160 --> 00:05:02.220
First of all, sigmoid.

00:05:02.220 --> 00:05:06.910
Well our old buddy, sigmoid,
I just said it a moment ago,

00:05:06.910 --> 00:05:10.275
sigmoid is 1 over 1
plus e to the minus x.

00:05:18.330 --> 00:05:34.940
Also, fun fact about sigmoid,
the derivative of sigmoid,

00:05:34.940 --> 00:05:41.210
is itself-- the
derivative of sigmoid

00:05:41.210 --> 00:05:47.250
is-- let's say that the
sigmoid-- we'll just

00:05:47.250 --> 00:05:53.001
turn sigmoid into
like the letter say y.

00:05:53.001 --> 00:05:54.670
Y is the result, right?

00:05:54.670 --> 00:06:02.650
So if you say y equals 1 over
1 plus e to the negative x,

00:06:02.650 --> 00:06:09.480
then the derivative of
sigmoid is y times 1 minus y.

00:06:13.330 --> 00:06:15.740
You can also write out
the whole nasty thing,

00:06:15.740 --> 00:06:19.380
it's 1 over 1 plus e to the
negative x times 1 minus 1

00:06:19.380 --> 00:06:21.780
over 1 plus e to negative x.

00:06:21.780 --> 00:06:23.820
So the nice property
of sigmoid it's

00:06:23.820 --> 00:06:28.160
going to be important for
us in the very near future,

00:06:28.160 --> 00:06:31.900
and that future begins now.

00:06:31.900 --> 00:06:33.790
So now the performance function.

00:06:33.790 --> 00:06:38.785
This is a function we used
to tell neural nets when

00:06:38.785 --> 00:06:42.700
they inevitably act up and
give us really crappy results.

00:06:42.700 --> 00:06:47.510
At first we tell them
just how long they are,

00:06:47.510 --> 00:06:49.430
with our performance function.

00:06:49.430 --> 00:06:53.060
The first function can
be any sane function

00:06:53.060 --> 00:07:01.980
that gives you a better score,
where better can be decided

00:07:01.980 --> 00:07:05.991
as lower or higher, if you feel
like, that gives you a better

00:07:05.991 --> 00:07:08.240
score, if your answers are
closer to the answer you're

00:07:08.240 --> 00:07:09.920
looking for.

00:07:09.920 --> 00:07:17.360
However, in this case, we
have, for a very sneaky reason,

00:07:17.360 --> 00:07:20.300
chosen the performance
function to be

00:07:20.300 --> 00:07:27.530
1/2 d, which is the
desired output, minus o,

00:07:27.530 --> 00:07:30.680
the actual output squared.

00:07:33.770 --> 00:07:38.900
So we want a small,
well it's negative,

00:07:38.900 --> 00:07:43.450
So we want a small
negative or 0.

00:07:43.450 --> 00:07:47.130
That would mean
we performed well.

00:07:47.130 --> 00:07:47.915
So why this?

00:07:50.560 --> 00:08:05.090
Well the main reason
is ddx of performance

00:08:05.090 --> 00:08:09.450
is, the 2 comes down, the o
is the variable that we're

00:08:09.450 --> 00:08:16.930
actually, so maybe I should say
ddo, that negative comes out,

00:08:16.930 --> 00:08:22.340
we get a simple d minus o.

00:08:25.980 --> 00:08:29.370
And yeah, we're using
derivatives here.

00:08:29.370 --> 00:08:32.049
So those are fine.

00:08:32.049 --> 00:08:33.620
These are two assumptions.

00:08:33.620 --> 00:08:35.493
They could be
changed on your test.

00:08:35.493 --> 00:08:37.909
We're going to figure out what
happens, if we change them,

00:08:37.909 --> 00:08:41.440
if we change the performance,
if we change the sigmoid, that

00:08:41.440 --> 00:08:44.927
is if we change the sigmoid
to some other function, what's

00:08:44.927 --> 00:08:47.010
going to happen to the
next three functions, which

00:08:47.010 --> 00:08:48.920
are basically the only
things that you need

00:08:48.920 --> 00:08:51.410
to know to do backpropagation.

00:08:51.410 --> 00:08:52.880
So let's look at that.

00:08:52.880 --> 00:08:54.210
First, w prime.

00:08:54.210 --> 00:08:56.820
This is the formula
for a new weight.

00:08:56.820 --> 00:08:59.520
After one step of
backpropagation.

00:08:59.520 --> 00:09:01.490
A new weight in any
of these positions

00:09:01.490 --> 00:09:07.370
that you can see up here on
this beautiful neural net.

00:09:07.370 --> 00:09:11.670
That w-- each of the w's will
have to change step by step.

00:09:11.670 --> 00:09:14.330
That's, in fact, how you do
the hill climbing neural nets.

00:09:14.330 --> 00:09:16.635
You change the
weights incrementally.

00:09:16.635 --> 00:09:19.000
You step a little
bit in the direction

00:09:19.000 --> 00:09:23.170
towards giving you your desired
results until eventually, you

00:09:23.170 --> 00:09:27.100
hope, you have an
intelligent neural net.

00:09:27.100 --> 00:09:29.265
And maybe you have many
different training examples

00:09:29.265 --> 00:09:32.140
that you run it on,
in a cycle, hoping

00:09:32.140 --> 00:09:35.980
that you don't over fit to
your one sample, on a computer.

00:09:35.980 --> 00:09:38.510
But on the test, we will
probably will not do that.

00:09:38.510 --> 00:09:40.750
So let's take a look
at how you calculate

00:09:40.750 --> 00:09:43.060
the weights for the next level.

00:09:43.060 --> 00:09:45.570
And then you have the weights
for the current level.

00:09:45.570 --> 00:09:48.080
So first things first.

00:09:48.080 --> 00:09:49.810
New weight, weight
prime equals--

00:09:49.810 --> 00:09:51.500
starts with the old weight.

00:09:51.500 --> 00:09:54.110
That has to go there
because otherwise we're

00:09:54.110 --> 00:09:56.910
just going to jump off
somewhere at random.

00:09:56.910 --> 00:10:00.010
We want to make a little
step in some direction,

00:10:00.010 --> 00:10:03.250
so we want to start where
we are, with the weight.

00:10:03.250 --> 00:10:08.720
And then we're going
to add three things.

00:10:08.720 --> 00:10:11.050
So if we're talking
about the weight

00:10:11.050 --> 00:10:17.390
between some i and
some j-- there's

00:10:17.390 --> 00:10:20.190
some examples of the
names of weights.

00:10:20.190 --> 00:10:24.240
So this is w 1 i, that's
the weight between 1

00:10:24.240 --> 00:10:29.332
and-- so this is w 1 a, it's
the weight between 1 and a.

00:10:29.332 --> 00:10:34.920
This is w 2 b, which is the
weight between 2 and b .

00:10:34.920 --> 00:10:36.170
Makes sense?

00:10:36.170 --> 00:10:38.270
Well makes sense
so far, but what

00:10:38.270 --> 00:10:42.430
if it's just called
w b, then it's

00:10:42.430 --> 00:10:46.279
the weight between-- these
w's that only have one letter,

00:10:46.279 --> 00:10:47.070
we'll get to later.

00:10:47.070 --> 00:10:47.930
They're the bias.

00:10:47.930 --> 00:10:49.560
They're the offset.

00:10:49.560 --> 00:10:52.020
They are always attached
to a negative 1.

00:10:52.020 --> 00:11:00.520
So you can pretty much treat
them as being a negative 1

00:11:00.520 --> 00:11:04.840
here, that is then fed into
a multiplier with this w b,

00:11:04.840 --> 00:11:05.673
if you like.

00:11:10.880 --> 00:11:12.370
This is implied to be that.

00:11:12.370 --> 00:11:15.920
All of the offsets are
implied to be that.

00:11:15.920 --> 00:11:24.130
So w plus sum of alpha--
why is this Greek letter?

00:11:24.130 --> 00:11:25.240
Where does it come from?

00:11:25.240 --> 00:11:26.320
How do we calculate it?

00:11:26.320 --> 00:11:32.149
Well alpha is just some value
told to you on the quiz.

00:11:32.149 --> 00:11:33.190
You'll find it somewhere.

00:11:33.190 --> 00:11:35.810
There's no way you're going
to have to calculate alpha.

00:11:35.810 --> 00:11:38.090
You might be asked to
try to give us an alpha,

00:11:38.090 --> 00:11:39.530
but probably not.

00:11:39.530 --> 00:11:42.660
Alpha is supposed to give
the size of our little steps

00:11:42.660 --> 00:11:44.650
that we take when we're
doing hill climbing.

00:11:44.650 --> 00:11:46.600
Very large alpha,
take a huge step.

00:11:46.600 --> 00:11:49.410
Very small alpha,
take tentative steps.

00:11:49.410 --> 00:11:53.250
So alpha is there, basically,
to change this answer

00:11:53.250 --> 00:11:58.900
and to make the new value either
very close to w, or far from w,

00:11:58.900 --> 00:12:00.770
depending on our taste.

00:12:00.770 --> 00:12:14.250
So plus alpha times
i, so i is the value

00:12:14.250 --> 00:12:23.999
coming in into the node.

00:12:23.999 --> 00:12:25.290
We're changing the weight here.

00:12:25.290 --> 00:12:30.870
So i is the value, for
instance, i sub 1 here,

00:12:30.870 --> 00:12:36.080
i would be the value of WAC,
i would be the value coming

00:12:36.080 --> 00:12:40.170
output of node a.

00:12:40.170 --> 00:12:43.600
WBC, i would be the
output of node b.

00:12:43.600 --> 00:12:47.630
i is sometimes as little
as i is the input coming in

00:12:47.630 --> 00:12:51.140
to meet that weight
at the multiplier.

00:12:51.140 --> 00:12:56.700
And then it's
multiplied by delta j.

00:12:56.700 --> 00:12:59.560
Your delta is the
delta that belongs

00:12:59.560 --> 00:13:02.200
to these neural net nodes.

00:13:02.200 --> 00:13:04.160
What is a delta, you said?

00:13:04.160 --> 00:13:05.420
Funny you may ask.

00:13:05.420 --> 00:13:07.222
It is a strange Greek letter.

00:13:07.222 --> 00:13:08.930
It sort of comes from
the fact that we're

00:13:08.930 --> 00:13:10.638
doing some partial
derivatives and stuff,

00:13:10.638 --> 00:13:13.335
but the main way you're going to
figure out what the deltas are

00:13:13.335 --> 00:13:17.040
are these two formulae that
I've not written in yet.

00:13:17.040 --> 00:13:20.970
So hold off on
trying to figure out

00:13:20.970 --> 00:13:25.033
what the delta is until--
well right now, I'm

00:13:25.033 --> 00:13:26.710
about to tell you the delta is.

00:13:26.710 --> 00:13:30.670
So the delta is basically,
think of the delta

00:13:30.670 --> 00:13:34.870
as using partial
derivatives to figure out

00:13:34.870 --> 00:13:36.590
which way you're
going to step, when

00:13:36.590 --> 00:13:37.940
you're doing hill climbing.

00:13:37.940 --> 00:13:39.981
Because you know when
you're doing hill climbing,

00:13:39.981 --> 00:13:42.090
you look around, you
figure out, OK, this

00:13:42.090 --> 00:13:44.320
is the direction of
the highest increase,

00:13:44.320 --> 00:13:46.980
and then you step off
in that direction.

00:13:46.980 --> 00:13:48.720
So the deltas are
telling you which way

00:13:48.720 --> 00:13:51.270
to step, with the weights.

00:13:51.270 --> 00:13:55.480
And the way they do that is by
taking the partial derivative

00:13:55.480 --> 00:13:58.460
of-- basically you
try to figure out

00:13:58.460 --> 00:14:02.320
how the weight that you're
currently looking at

00:14:02.320 --> 00:14:07.470
is contributing to the
performance of the net.

00:14:07.470 --> 00:14:10.064
Contributing to, either
the good performance

00:14:10.064 --> 00:14:11.980
of the net, or the bad
performance of the net.

00:14:14.630 --> 00:14:21.330
So when you're dealing with
the weights, like WBC, WAC,

00:14:21.330 --> 00:14:27.090
that pretty much directly
feed into the end of the net.

00:14:27.090 --> 00:14:29.330
They feed into the last
node, and it then comes out.

00:14:29.330 --> 00:14:31.990
It's the output.

00:14:31.990 --> 00:14:33.590
That's pretty easy.

00:14:33.590 --> 00:14:36.270
You can tell exactly
how much those weights,

00:14:36.270 --> 00:14:39.440
and the values coming from them,
are contributing to the end.

00:14:39.440 --> 00:14:46.950
And we do that by
essentially, remember

00:14:46.950 --> 00:14:53.120
what the partial derivative,
so partial derivative here

00:14:53.120 --> 00:14:59.240
is, in fact, the way that the
final weights are contributing

00:14:59.240 --> 00:15:02.850
to the performance, is just
the performance function.

00:15:02.850 --> 00:15:05.160
Partial derivative--
I've already

00:15:05.160 --> 00:15:07.830
figured out the derivative
here, it's just d minus o.

00:15:11.800 --> 00:15:15.490
This is for sort
of final weights,

00:15:15.490 --> 00:15:17.670
the weights in the last level.

00:15:17.670 --> 00:15:19.790
D minus o, except
we're not done yet,

00:15:19.790 --> 00:15:24.560
because when we do derivatives,
remember the chain rule.

00:15:24.560 --> 00:15:29.330
To get from the end to these
weights, we pass through,

00:15:29.330 --> 00:15:31.540
well it should be a
sigmoid, here it's not,

00:15:31.540 --> 00:15:33.730
we're going to pretend
it is for the moment,

00:15:33.730 --> 00:15:36.580
we pass through a
sigmoid, and since we

00:15:36.580 --> 00:15:38.420
passed through the
sigmoid, we had better

00:15:38.420 --> 00:15:41.050
take the derivative of
the sigmoid function.

00:15:41.050 --> 00:15:46.700
That is, y times 1 minus y.

00:15:46.700 --> 00:15:47.510
Well what is y?

00:15:47.510 --> 00:15:50.150
What is the output
of the sigmoid?

00:15:50.150 --> 00:15:51.970
It's up.

00:15:51.970 --> 00:15:56.520
So that's also multiplied
by o times 1 minus o.

00:16:02.667 --> 00:16:05.480
However, there is
a-- let me see,

00:16:05.480 --> 00:16:10.780
let me see, yes-- sorry, I'm
carefully studying this sheet

00:16:10.780 --> 00:16:13.000
to make sure my
nomenclature is exactly

00:16:13.000 --> 00:16:17.120
right for our new nomenclature,
which so new and brave,

00:16:17.120 --> 00:16:19.510
that we're doing it, that
we only knew for sure we're

00:16:19.510 --> 00:16:21.450
going to do it on Wednesday.

00:16:21.450 --> 00:16:26.160
So we have d minus o
times o times 1 minus o.

00:16:29.100 --> 00:16:35.440
So you say, that's fine,
that can get us these weights

00:16:35.440 --> 00:16:39.300
here, even this w
c, how are we going

00:16:39.300 --> 00:16:48.070
to get the deltas for
the new weights here?

00:16:51.330 --> 00:16:55.870
Oh, I realize-- yeah, I got it.

00:16:55.870 --> 00:16:58.860
So the delta-- by the
way, this is a delta c,

00:16:58.860 --> 00:17:00.980
how is neuron c
contributing to the output?

00:17:00.980 --> 00:17:03.020
Well it's directly
contributing to the output ,

00:17:03.020 --> 00:17:04.819
and it's got a sigmoid in it.

00:17:04.819 --> 00:17:07.290
It doesn't really, but we're
pretending it does for now.

00:17:07.290 --> 00:17:10.440
d minus o times 1 minus o.

00:17:10.440 --> 00:17:12.329
What about inner node?

00:17:12.329 --> 00:17:15.780
Node d, node a, what are
we going to have to do?

00:17:15.780 --> 00:17:18.329
Well the way they
contribute to the output is

00:17:18.329 --> 00:17:20.819
that they contribute to node c.

00:17:20.819 --> 00:17:25.170
So we can do this
problem recursively.

00:17:25.170 --> 00:17:27.680
So let's do this recursively.

00:17:27.680 --> 00:17:30.080
First of all, as you have
probably figured out,

00:17:30.080 --> 00:17:32.880
all of them are going to have
an o times 1 minus o factoring

00:17:32.880 --> 00:17:35.570
from the chain rule, because
they're all sigmoid, pretending

00:17:35.570 --> 00:17:36.653
that they're all sigmoids.

00:17:39.430 --> 00:17:41.440
We also have a dearth
of good problems

00:17:41.440 --> 00:17:44.800
that are actually sigmoid
on the web right now.

00:17:44.800 --> 00:17:46.990
There's only 2007.

00:17:46.990 --> 00:17:50.800
But here's o times
1 minus o, what

00:17:50.800 --> 00:17:53.070
are we going to do
for the rest of it?

00:17:53.070 --> 00:17:55.770
How does it contribute
to our final result?

00:17:55.770 --> 00:18:00.460
Well it contributes to our
final result recursively.

00:18:00.460 --> 00:18:01.920
So we're talking about delta i.

00:18:01.920 --> 00:18:04.930
I is an inner node.

00:18:04.930 --> 00:18:06.740
It's not a final node.

00:18:06.740 --> 00:18:08.390
It's somewhere along the way.

00:18:08.390 --> 00:18:24.500
So sum over j of w, going
from i to j, times delta j.

00:18:24.500 --> 00:18:32.000
Now sum over all j, j
such that i leads to j.

00:18:32.000 --> 00:18:34.860
I needs to have a
direct path into j.

00:18:34.860 --> 00:18:38.830
So if i, in this
instance, was j,

00:18:38.830 --> 00:18:45.240
everyone, the only possible
j in this would be c.

00:18:45.240 --> 00:18:46.170
That's right.

00:18:46.170 --> 00:18:49.940
We would not sum over
b as one of the j.

00:18:49.940 --> 00:18:54.620
i does not lead to b, or a does
not lead to b, a only leads

00:18:54.620 --> 00:18:55.470
to c.

00:18:55.470 --> 00:18:57.820
Also note that c does
not need to be here.

00:18:57.820 --> 00:18:59.860
That's going backwards.

00:18:59.860 --> 00:19:02.760
So you just-- to figure out
which j you're looking at,

00:19:02.760 --> 00:19:07.430
look directly forwards
at the next one.

00:19:07.430 --> 00:19:11.030
So if there was another d here,
or that a does not go to d,

00:19:11.030 --> 00:19:14.262
a goes to c.

00:19:14.262 --> 00:19:15.970
You only look at the
next level children,

00:19:15.970 --> 00:19:20.890
and you sum over
all those children,

00:19:20.890 --> 00:19:23.050
the weight between
them, multiplied

00:19:23.050 --> 00:19:25.040
by the child's delta.

00:19:25.040 --> 00:19:26.440
That makes sense, right?

00:19:26.440 --> 00:19:30.180
Because the way we affect, if
the child's delta is the way

00:19:30.180 --> 00:19:33.850
the child affects the output,
calling these children

00:19:33.850 --> 00:19:37.042
for a moment, and then
if this one directly

00:19:37.042 --> 00:19:38.750
affects the output,
then the way this one

00:19:38.750 --> 00:19:44.290
affects it is-- it affects
it because it affects this,

00:19:44.290 --> 00:19:46.960
but it's also multiplied
by it's weight.

00:19:46.960 --> 00:19:52.440
So in fact, for instance, if the
weight between a and c were 0,

00:19:52.440 --> 00:19:55.690
then a doesn't affect
the output at all, right?

00:19:55.690 --> 00:20:00.600
Because its weight is 0,
and when we do this problem,

00:20:00.600 --> 00:20:05.010
we go this times 0, and then
we try to add it in there,

00:20:05.010 --> 00:20:06.510
doesn't affect anything.

00:20:06.510 --> 00:20:08.620
It's weight is very high,
it's going to really

00:20:08.620 --> 00:20:14.280
dominate c, and that is
taken into account here,

00:20:14.280 --> 00:20:19.090
and then multiply by the
delta for the right node.

00:20:19.090 --> 00:20:22.800
So the following
question, and since I

00:20:22.800 --> 00:20:26.610
spent a lot of time with
formulae and not that much time

00:20:26.610 --> 00:20:30.650
starting on the problem, I will
not call on someone at random,

00:20:30.650 --> 00:20:32.152
but rather take a volunteer.

00:20:32.152 --> 00:20:33.910
If no one volunteers,
I'll eventually

00:20:33.910 --> 00:20:37.530
tell you, which is, we've
got some nice formulae

00:20:37.530 --> 00:20:39.970
on the bottom three.

00:20:39.970 --> 00:20:43.140
If we change the sigmoid
function, what has to change?

00:20:50.770 --> 00:20:53.560
That's the only
thing that changes

00:20:53.560 --> 00:20:56.000
in this crazy assed problem
right here, which by the way,

00:20:56.000 --> 00:20:58.820
changes the sigmoid
functions into adders,

00:20:58.820 --> 00:21:00.690
is that we take all
of the o times 1

00:21:00.690 --> 00:21:04.440
minus o in delta
f and the delta i,

00:21:04.440 --> 00:21:06.750
and we change it to
a new derivative.

00:21:06.750 --> 00:21:09.850
We then do the exact same
thing that we would've done.

00:21:09.850 --> 00:21:10.650
Correct.

00:21:10.650 --> 00:21:13.040
And on a similar note, if
you change the performance

00:21:13.040 --> 00:21:15.090
function, how many
of these equations

00:21:15.090 --> 00:21:18.770
at all have to change
out of the bottom three.

00:21:21.360 --> 00:21:22.070
Yeah.

00:21:22.070 --> 00:21:24.430
That's right, just
one, just delta f.

00:21:24.430 --> 00:21:26.480
Take the d minus o, make
it the new derivative

00:21:26.480 --> 00:21:28.390
of the new performance function.

00:21:28.390 --> 00:21:31.920
And in fact, delta i
doesn't change at all.

00:21:31.920 --> 00:21:33.220
Does everyone see that?

00:21:33.220 --> 00:21:37.530
Because it is very common
for something to be replaced,

00:21:37.530 --> 00:21:39.670
I think three of the four
the quizzes that we have,

00:21:39.670 --> 00:21:43.190
replaced in some-- changed
something in some way.

00:21:43.190 --> 00:21:43.690
All right.

00:21:43.690 --> 00:21:44.620
Let's go.

00:21:44.620 --> 00:21:47.690
We're going to do
2008 quiz, because it

00:21:47.690 --> 00:21:49.710
has a part of the end
that screwed up everyone,

00:21:49.710 --> 00:21:51.510
and so let's make sure
we get to that part.

00:21:51.510 --> 00:21:53.593
That's going to be the
part that you probably care

00:21:53.593 --> 00:21:54.920
about the most at this point.

00:21:54.920 --> 00:21:58.050
So these are all adders
instead of sigmoids.

00:21:58.050 --> 00:22:03.180
That means that they simply
add up everything as normal,

00:22:03.180 --> 00:22:04.960
for a normal neural
net, and then there's

00:22:04.960 --> 00:22:06.070
no sigmoid threshold.

00:22:06.070 --> 00:22:07.710
They just give
some kind of value.

00:22:07.710 --> 00:22:08.450
Question?

00:22:08.450 --> 00:22:11.354
STUDENT: So we talked about
those multiplier things,

00:22:11.354 --> 00:22:14.750
we don't have those in nodes?

00:22:14.750 --> 00:22:16.940
PROFESSOR: They're
not neural net nodes.

00:22:16.940 --> 00:22:22.020
That is one of the reasons
why that other form that you

00:22:22.020 --> 00:22:23.940
can see over there is elegant.

00:22:23.940 --> 00:22:25.720
It only has the
actual nodes on it.

00:22:25.720 --> 00:22:26.970
It is very compact.

00:22:26.970 --> 00:22:30.210
It's one of the front we've
used in the previous tests.

00:22:30.210 --> 00:22:35.020
The question is, do those
multipliers count as nodes?

00:22:35.020 --> 00:22:38.110
However by not putting
in the multipliers,

00:22:38.110 --> 00:22:41.410
we feel it sometimes confuses
people of explicitness.

00:22:41.410 --> 00:22:43.410
The ones that are
nodes will always

00:22:43.410 --> 00:22:46.230
have a label, like
a or here, you see

00:22:46.230 --> 00:22:49.240
there's a sigmoid and an L1.

00:22:49.240 --> 00:22:52.490
The multipliers are there
for your convenience,

00:22:52.490 --> 00:22:54.310
to remind you to
multiply, and also

00:22:54.310 --> 00:22:56.650
those, if you look those
sigmoids that are over there,

00:22:56.650 --> 00:23:00.330
are there for your convenience
to remind you to add.

00:23:00.330 --> 00:23:02.850
In fact, the only
thing that counts

00:23:02.850 --> 00:23:04.636
as a node in the
neural net-- and that's

00:23:04.636 --> 00:23:07.810
a very good question--
is usually the sigmoids,

00:23:07.810 --> 00:23:10.382
here it's the adders.

00:23:10.382 --> 00:23:12.090
We've essentially
taken out the sigmoids.

00:23:12.090 --> 00:23:17.370
These adders are the-- oh,
here's the way to tell.

00:23:17.370 --> 00:23:22.010
If it's got a threshold
weight associated with it,

00:23:22.010 --> 00:23:25.200
then it's one of
the actual nodes.

00:23:25.200 --> 00:23:26.220
A threshold weight.

00:23:26.220 --> 00:23:28.428
I guess the multipliers look
like they have a weight,

00:23:28.428 --> 00:23:31.774
but this is just the weight
that is being multiplied in.

00:23:31.774 --> 00:23:33.940
This is our witness be
multiplied in with the input,

00:23:33.940 --> 00:23:35.973
but if it has a threshold
weight, like wa,

00:23:35.973 --> 00:23:38.030
wb-- oh, I promised
I would tell you guys

00:23:38.030 --> 00:23:39.760
the difference between
the two weights.

00:23:39.760 --> 00:23:41.960
So let's do that very quickly.

00:23:41.960 --> 00:23:46.860
The kinds of weights that,
say w2b or w1a, our weight the

00:23:46.860 --> 00:23:51.775
comes between input
1 and a or between a

00:23:51.775 --> 00:23:57.590
and c, then mentally multiplying
the input by this weight,

00:23:57.590 --> 00:23:59.840
and then eventually
that's added together.

00:23:59.840 --> 00:24:06.040
The threshold weights, they
just have like wb, wa, wc.

00:24:06.040 --> 00:24:09.900
They are essentially to decide
the threshold for a success

00:24:09.900 --> 00:24:14.715
or failure, for a 1 or a
0, or anything in between,

00:24:14.715 --> 00:24:17.150
at any of the given nodes.

00:24:17.150 --> 00:24:20.500
So the idea is maybe
you at some node

00:24:20.500 --> 00:24:22.260
want to have a
really high cut off,

00:24:22.260 --> 00:24:25.180
you have to very high value
coming in, or else it's a 0.

00:24:25.180 --> 00:24:27.060
So you put a high threshold.

00:24:27.060 --> 00:24:29.200
The weight is multiplied
by negative 1.

00:24:29.200 --> 00:24:37.880
And in fact, the threshold
weight won't-- one could

00:24:37.880 --> 00:24:40.850
consider if you wanted to that
the threshold weight times

00:24:40.850 --> 00:24:44.030
negative 1. was also
added in it that sum,

00:24:44.030 --> 00:24:47.560
instead of putting at the
same location as the node.

00:24:47.560 --> 00:24:50.140
If that works better for you,
when you're converting it,

00:24:50.140 --> 00:24:52.010
you can also think
of it that way.

00:24:52.010 --> 00:24:54.090
Because the threshold
weight is essentially

00:24:54.090 --> 00:24:55.650
multiplied by negative
1 and added in

00:24:55.650 --> 00:24:58.130
at that same sum over there.

00:24:58.130 --> 00:25:02.062
So that's another way to do it.

00:25:02.062 --> 00:25:04.270
There's a lot of ways to
visualize these neural nets.

00:25:04.270 --> 00:25:07.940
Just make sure you have a
way that makes sense to you,

00:25:07.940 --> 00:25:09.690
and that you can tell
pretty much whatever

00:25:09.690 --> 00:25:12.050
we write, as long as it
looks vaguely like that,

00:25:12.050 --> 00:25:15.159
how to get it in your mind,
into the representation

00:25:15.159 --> 00:25:15.950
that works for you.

00:25:15.950 --> 00:25:18.241
Because once you have the
representation right for you,

00:25:18.241 --> 00:25:20.360
you're more than halfway
to solving these guys.

00:25:20.360 --> 00:25:21.660
They aren't that bad.

00:25:21.660 --> 00:25:23.080
They just look nasty.

00:25:23.080 --> 00:25:24.310
They don't bite.

00:25:24.310 --> 00:25:24.900
OK.

00:25:24.900 --> 00:25:26.560
These are just adders.

00:25:26.560 --> 00:25:28.570
So if it's just an
adder, then that

00:25:28.570 --> 00:25:34.410
means that, if we take all
the x inputs coming in--

00:25:34.410 --> 00:25:36.630
let's do x and y for the
moment, so we can figure out

00:25:36.630 --> 00:25:39.980
the derivative--
then what comes out

00:25:39.980 --> 00:25:46.830
after we just add up the x, what
comes out, y equals x, right?

00:25:46.830 --> 00:25:48.770
We're just adding it up.

00:25:48.770 --> 00:25:52.290
Adding up all the input, we're
not doing anything to it.

00:25:52.290 --> 00:25:55.910
Y equals x is what
this node does.

00:25:55.910 --> 00:25:56.790
You people see that?

00:25:56.790 --> 00:26:00.690
So the derivative is just one.

00:26:00.690 --> 00:26:04.590
So that's pretty easy, because
the first problem says,

00:26:04.590 --> 00:26:10.850
what is the new
formula, delta f.

00:26:10.850 --> 00:26:12.890
So I'll just tell you.

00:26:12.890 --> 00:26:14.810
You guys probably
figured it out.

00:26:14.810 --> 00:26:17.538
o times 1 minus o.

00:26:17.538 --> 00:26:21.530
Because we replaced
d minus o with 1.

00:26:21.530 --> 00:26:22.230
OK?

00:26:22.230 --> 00:26:23.172
Makes sense so far?

00:26:23.172 --> 00:26:24.630
Please ask questions
along the way,

00:26:24.630 --> 00:26:26.660
because I'm not going
to be asking you guys.

00:26:26.660 --> 00:26:27.805
I'll do it myself.

00:26:27.805 --> 00:26:28.305
Question?

00:26:28.305 --> 00:26:29.888
STUDENT: Why to we
use d minus o of 1?

00:26:32.890 --> 00:26:34.600
PROFESSOR: That's
a good question.

00:26:34.600 --> 00:26:37.510
The reason is because
I did the wrong thing.

00:26:37.510 --> 00:26:42.230
So see, it's good that you
guys are asking questions.

00:26:42.230 --> 00:26:44.850
It actually should be replaced
with o times 1 minus o with 1.

00:26:44.850 --> 00:26:49.350
The answer is delta
f equals d minus o.

00:26:49.350 --> 00:26:52.170
So yes, perhaps I
did it to trick you.

00:26:52.170 --> 00:26:54.570
No, I actually messed up.

00:26:54.570 --> 00:26:56.820
But yes, please ask
questions along the way.

00:26:56.820 --> 00:26:59.080
Again, I don't have
time to call on you guys

00:26:59.080 --> 00:27:01.520
at random to figure out if
you guys are following along.

00:27:01.520 --> 00:27:03.020
So I'll do it myself.

00:27:03.020 --> 00:27:06.180
We're placing the o times
1 minus o with 1 because

00:27:06.180 --> 00:27:07.860
of the fact that
the sigmoid is gone,

00:27:07.860 --> 00:27:10.960
and we get just delta
f equals d minus o.

00:27:10.960 --> 00:27:13.660
So great.

00:27:13.660 --> 00:27:18.320
We now want to know what
the equation is for delta i,

00:27:18.320 --> 00:27:19.720
at the node a.

00:27:19.720 --> 00:27:21.740
So delta a.

00:27:21.740 --> 00:27:24.270
Well let's take a look.

00:27:24.270 --> 00:27:26.530
The o times 1 minus o is gone.

00:27:26.530 --> 00:27:30.210
Now we just have the sum over
j, which you guys already

00:27:30.210 --> 00:27:36.120
told me is, only c
of WAC times delta c.

00:27:36.120 --> 00:27:38.350
We know that delta
c is d minus o.

00:27:38.350 --> 00:27:43.590
The answer is delta a is
just WAC times d minus o.

00:27:43.590 --> 00:27:46.050
That time, I got it right.

00:27:46.050 --> 00:27:47.684
I see the answer here.

00:27:47.684 --> 00:27:49.600
Though it's written in
a very different format

00:27:49.600 --> 00:27:52.290
from the old quiz.

00:27:52.290 --> 00:27:54.590
Any questions on that?

00:27:54.590 --> 00:27:58.070
Well that's part a that
we finished out of c.

00:27:58.070 --> 00:27:59.300
Let's go to part b.

00:27:59.300 --> 00:28:01.625
Part b is doing one
step backpropagation.

00:28:01.625 --> 00:28:04.430
There's almost always going
to be one of these in here.

00:28:04.430 --> 00:28:07.740
So the first thing it
asks is to figure out

00:28:07.740 --> 00:28:11.510
what the output o is
for this neural net

00:28:11.510 --> 00:28:15.670
if all weights are initially 1
except that this guy right here

00:28:15.670 --> 00:28:18.120
is negative 0.5.

00:28:18.120 --> 00:28:22.120
All the other ones
start off as 1.

00:28:22.120 --> 00:28:25.560
Let's do a step-- oh, let's
see what are the inputs.

00:28:25.560 --> 00:28:28.040
The inputs are also all 1.

00:28:28.040 --> 00:28:31.320
Desired output is also 1.

00:28:31.320 --> 00:28:37.980
And in fact, the rate
constant alpha is also 1.

00:28:37.980 --> 00:28:40.490
This is the only thing
that isn't 1, folks.

00:28:40.490 --> 00:28:42.070
So let's see what happens.

00:28:42.070 --> 00:28:48.210
1 times 1 is 1, then
this is a negative 1

00:28:48.210 --> 00:28:50.870
times 1 is negative 1.

00:28:50.870 --> 00:28:53.062
That's 0.

00:28:53.062 --> 00:28:55.520
The exact same thing happens
here because it's symmetrical.

00:28:55.520 --> 00:28:57.546
So these are both 0.

00:28:57.546 --> 00:29:03.150
0 times 1 is 0, 0 times 1 is 0.

00:29:03.150 --> 00:29:07.770
Then this is negative 1 times
negative 0.5 is positive 0.5,

00:29:07.770 --> 00:29:14.740
so 0 plus 0 plus a positive
0.5, the output is positive 0.5.

00:29:14.740 --> 00:29:16.700
Does everyone see that?

00:29:16.700 --> 00:29:20.830
If not, you can
convince yourself

00:29:20.830 --> 00:29:21.900
that it is positive 0.5.

00:29:21.900 --> 00:29:23.570
That would be a good
exercise for you,

00:29:23.570 --> 00:29:25.470
run through one forward run.

00:29:25.470 --> 00:29:28.650
The output is
definitely positive 0.5.

00:29:28.650 --> 00:29:30.000
First time around.

00:29:30.000 --> 00:29:31.130
OK?

00:29:31.130 --> 00:29:34.000
Now we have to do one
step of backpropagation.

00:29:34.000 --> 00:29:35.900
To do that, let's
calculate all the delta

00:29:35.900 --> 00:29:37.810
so that we can calculate
all the new weights,

00:29:37.810 --> 00:29:39.560
the the new weight primes.

00:29:39.560 --> 00:29:42.660
So delta c.

00:29:42.660 --> 00:29:43.490
That's easy.

00:29:43.490 --> 00:29:45.060
You guys can tell
me what delta c is.

00:29:45.060 --> 00:29:47.440
We figured out what the
new delta c is going to be.

00:29:47.440 --> 00:29:50.500
So simple addition or
subtraction problem?

00:29:50.500 --> 00:29:52.306
Everyone, delta c is?

00:29:52.306 --> 00:29:53.000
STUDENT: 0.5.

00:29:53.000 --> 00:29:58.086
PROFESSOR: 0.5, one half, yes.

00:29:58.086 --> 00:29:58.585
All right.

00:30:02.250 --> 00:30:07.690
We know that delta a and delta
b are just WAC times delta c,

00:30:07.690 --> 00:30:09.150
and WBC times delta c.

00:30:09.150 --> 00:30:10.660
So they are?

00:30:10.660 --> 00:30:11.460
STUDENT: One half.

00:30:11.460 --> 00:30:14.440
PROFESSOR: Also one half,
because all the weights were 1.

00:30:16.970 --> 00:30:17.750
Easy street.

00:30:17.750 --> 00:30:18.600
OK.

00:30:18.600 --> 00:30:20.980
We've got all of the
deltas are one half.

00:30:20.980 --> 00:30:23.290
And all but a few of
the weights are 1.

00:30:23.290 --> 00:30:25.190
So let's figure out what
the new weights are.

00:30:29.910 --> 00:30:32.140
New WAC, OK.

00:30:37.520 --> 00:30:38.320
Yeah, so let's see.

00:30:38.320 --> 00:30:40.480
What's going to be the new WAC?

00:30:40.480 --> 00:30:45.080
So the new WAC is
going to be old

00:30:45.080 --> 00:30:48.880
WAC, which is 1, because all
of them are 1 except for wc,

00:30:48.880 --> 00:30:53.960
plus the rate constant which
is 1, times the input coming

00:30:53.960 --> 00:30:57.270
in here, but
remember that was 0,

00:30:57.270 --> 00:31:01.500
so actually it's just going
to be the same as the old WAC.

00:31:01.500 --> 00:31:06.000
This is a metrical problem
between b and a, at the moment,

00:31:06.000 --> 00:31:08.720
this is going to be the same.

00:31:08.720 --> 00:31:09.407
All right.

00:31:09.407 --> 00:31:10.990
Somethings are going
to change though.

00:31:10.990 --> 00:31:14.960
What about wc, that was the
one that was actually not 1?

00:31:14.960 --> 00:31:15.610
OK.

00:31:15.610 --> 00:31:25.120
So new wc, remember,
the i for wc,

00:31:25.120 --> 00:31:26.720
the i that we use
in this equation

00:31:26.720 --> 00:31:29.680
is always negative 1
because it's a threshold.

00:31:29.680 --> 00:31:37.700
So we have the old wc, which
is negative 0.5, plus 1 times

00:31:37.700 --> 00:31:41.990
negative 1 times delta
c, which is one half.

00:31:41.990 --> 00:31:49.330
So we have negative 0.5 plus
negative 0.5 equals negative 1.

00:31:49.330 --> 00:31:57.260
w 1 a, well we've got
w 1 a starts out as 1.

00:31:57.260 --> 00:32:04.180
Then we also know
that w 1 a is going

00:32:04.180 --> 00:32:08.090
to be equal to 1 plus 1
times the input, which

00:32:08.090 --> 00:32:18.420
is 1, times delta of a,
which is one half, so 1.5.

00:32:18.420 --> 00:32:25.000
And since it's symmetrical
between a and b, then w 2

00:32:25.000 --> 00:32:29.400
b is also 1.5.

00:32:29.400 --> 00:32:35.730
And then finally, wa and wb,
the offsets here, well they

00:32:35.730 --> 00:32:41.250
start at 1 plus 1 times
negative 1 times 0.5.

00:32:41.250 --> 00:32:44.150
So they're both, everyone?

00:32:44.150 --> 00:32:44.947
STUDENT: One half.

00:32:44.947 --> 00:32:45.780
PROFESSOR: One half.

00:32:45.780 --> 00:32:46.321
That's right.

00:32:55.330 --> 00:32:56.080
That's right.

00:32:56.080 --> 00:33:00.780
Because negative 1 is their i.

00:33:00.780 --> 00:33:05.440
Negative 1 times one half plus
positive 1 is just one half.

00:33:05.440 --> 00:33:07.690
That's one full step.

00:33:07.690 --> 00:33:10.180
Maybe a mite easier than
you might be used to seeing,

00:33:10.180 --> 00:33:11.386
but there's a full step.

00:33:11.386 --> 00:33:13.510
And it asks what's going
to be the output after one

00:33:13.510 --> 00:33:15.509
step of backpropagation?

00:33:15.509 --> 00:33:16.300
We can take a look.

00:33:19.860 --> 00:33:26.280
So we have 1 times the new wa,
which is 1.5, you've got 1.5,

00:33:26.280 --> 00:33:29.130
then the new wa is
just 0.5, now is

00:33:29.130 --> 00:33:31.920
0.5, that's a 1
coming into an adder.

00:33:31.920 --> 00:33:34.910
We've got another 1 coming in
here because it's symmetrical.

00:33:34.910 --> 00:33:39.370
So 1 and a 1, 1 times WAC is 1.

00:33:39.370 --> 00:33:41.020
1 times WBC is 1.

00:33:41.020 --> 00:33:44.990
So we have two 1s coming in
here, they're added, that's 2.

00:33:44.990 --> 00:33:50.860
Then this has become negative
1, in fact, at this point.

00:33:50.860 --> 00:33:55.970
So negative 1 times negative 1,
that's 3, and the output is 3.

00:33:58.540 --> 00:33:59.070
All right.

00:33:59.070 --> 00:33:59.590
Cool.

00:33:59.590 --> 00:34:03.880
We've now finished part b, which
is over half of everything.

00:34:03.880 --> 00:34:05.186
Oh no, we've not.

00:34:05.186 --> 00:34:05.810
One more thing.

00:34:10.929 --> 00:34:11.690
These are adders.

00:34:11.690 --> 00:34:12.565
They're not sigmoids.

00:34:16.210 --> 00:34:19.095
What if we train this
entire neural net

00:34:19.095 --> 00:34:21.389
to try to learn this
data, so that it

00:34:21.389 --> 00:34:25.520
can draw a line on a
graph, or draw some lines,

00:34:25.520 --> 00:34:29.199
or do some kind of learning,
to separate off the minuses

00:34:29.199 --> 00:34:31.070
from all the pluses.

00:34:31.070 --> 00:34:33.280
You've seen, maybe,
and if not, you

00:34:33.280 --> 00:34:34.780
are about to in a
second, because it

00:34:34.780 --> 00:34:37.600
asks you to do this in detail,
than neural nets can usually

00:34:37.600 --> 00:34:40.830
draw one line on the
graph for each of these,

00:34:40.830 --> 00:34:43.120
sort of, nodes in the net,
because each of the nodes

00:34:43.120 --> 00:34:44.449
has some kind of threshold.

00:34:44.449 --> 00:34:49.480
And you can do some logic
between them like ands or ors.

00:34:49.480 --> 00:34:52.830
What do you guys think
this net is going to draw?

00:34:52.830 --> 00:34:55.389
Anyone could volunteer,
I'm not going to ask anyone

00:34:55.389 --> 00:34:58.260
to give this answer.

00:34:58.260 --> 00:35:01.510
That's a little bit
tricky, because usually

00:35:01.510 --> 00:35:03.890
if you had this many
nodes, you could easily

00:35:03.890 --> 00:35:07.750
draw a box and box off the
minuses from the pluses.

00:35:07.750 --> 00:35:11.680
However, it draws this.

00:35:11.680 --> 00:35:13.350
And it asks what is the error?

00:35:13.350 --> 00:35:16.570
The error is-- oh yeah, it even
tells you the error is 1/8,

00:35:16.570 --> 00:35:19.260
because why?

00:35:19.260 --> 00:35:20.660
These are all adders.

00:35:20.660 --> 00:35:23.460
You can't actually
do anything logical.

00:35:23.460 --> 00:35:25.760
This entire net boils
down to just one node,

00:35:25.760 --> 00:35:27.530
because it just
adds up every time.

00:35:27.530 --> 00:35:30.330
It never takes a
threshold at any point.

00:35:30.330 --> 00:35:33.160
So you can't turn into
logical ones and zeroes,

00:35:33.160 --> 00:35:37.440
because it's basically not
digital at all, its analog.

00:35:37.440 --> 00:35:39.500
It's giving us some
very high number.

00:35:39.500 --> 00:35:41.840
So it all boils
down to one cut off.

00:35:41.840 --> 00:35:43.850
And that's the best one.

00:35:43.850 --> 00:35:46.700
The one that I drew right here.

00:35:46.700 --> 00:35:47.929
OK.

00:35:47.929 --> 00:35:49.220
Did that not make sense to you?

00:35:49.220 --> 00:35:50.280
That's OK.

00:35:50.280 --> 00:35:52.080
This problem is much harder.

00:35:52.080 --> 00:35:56.040
And putting them both on the
same quiz, was a bit brutal,

00:35:56.040 --> 00:35:57.880
but by the time
you're done with this,

00:35:57.880 --> 00:36:00.380
you'll understand what a
neural net can do or not.

00:36:00.380 --> 00:36:02.840
I put these in simplified
form because of the fact

00:36:02.840 --> 00:36:06.350
that we don't care about their
values or anything like that.

00:36:06.350 --> 00:36:09.570
But inside of these little
circles is a sigmoid,

00:36:09.570 --> 00:36:13.860
the multipliers and the
summers are implied.

00:36:13.860 --> 00:36:16.380
I think in the simplified
form when we're not actually

00:36:16.380 --> 00:36:18.590
doing backpropagation is
easier to view it, and see

00:36:18.590 --> 00:36:19.950
how many nodes there are.

00:36:19.950 --> 00:36:21.741
For the same reason
you asked your question

00:36:21.741 --> 00:36:22.850
about how many there are.

00:36:22.850 --> 00:36:26.130
So all of those big
circles are node.

00:36:26.130 --> 00:36:30.080
And in those nodes is a sigmoid
now, not those crazy adders.

00:36:30.080 --> 00:36:31.680
We have the following problem.

00:36:31.680 --> 00:36:33.590
We have to try to
match each of a,

00:36:33.590 --> 00:36:37.690
b, c, d, e, f to 1, 2, 3,
4, 5, 6, using each of them

00:36:37.690 --> 00:36:39.100
only once.

00:36:39.100 --> 00:36:43.140
That's important, because some
of the more powerful networks

00:36:43.140 --> 00:36:45.730
in here can do a lot of these.

00:36:45.730 --> 00:36:49.050
So it's like yes, the
powerful networks could

00:36:49.050 --> 00:36:50.620
do some of the
easier problems here,

00:36:50.620 --> 00:36:53.940
but we want to match each
net to a problem it can do,

00:36:53.940 --> 00:36:57.720
and there is exactly one
mapping that will map-- that

00:36:57.720 --> 00:37:02.290
is one to one, and maps exactly,
uses all six of the nets

00:37:02.290 --> 00:37:04.700
to solve all six of
these problems here.

00:37:04.700 --> 00:37:07.030
So some of you may
be going like, what?

00:37:07.030 --> 00:37:08.870
How am I going to
solve these problems?

00:37:08.870 --> 00:37:11.490
I gave away a hint
before, which is

00:37:11.490 --> 00:37:16.940
that each node in the neural
net, each sigmoid node

00:37:16.940 --> 00:37:21.070
can usually draw
one line on the-- it

00:37:21.070 --> 00:37:23.290
can draw one line
into the picture.

00:37:23.290 --> 00:37:25.225
The line can be
diagonal if that nodes

00:37:25.225 --> 00:37:28.320
receives both of the inputs,
which is here, i 1 and i 2.

00:37:28.320 --> 00:37:30.470
See there is an i
1 and an i 2 axis.

00:37:30.470 --> 00:37:32.330
Like x- and a y-axis.

00:37:32.330 --> 00:37:36.420
The node has to be horizontal,
or vertical, if-- sorry,

00:37:36.420 --> 00:37:38.920
the line has to be horizontal
or vertical if the node only

00:37:38.920 --> 00:37:41.810
receives one of the inputs.

00:37:41.810 --> 00:37:46.560
And then, if you
have a deeper level,

00:37:46.560 --> 00:37:51.090
these secondary level nodes
can sort of do a logical,

00:37:51.090 --> 00:37:53.810
can do some kind of brilliant
thing like and or or of

00:37:53.810 --> 00:37:58.020
the first two, which
can help you out.

00:37:58.020 --> 00:37:58.770
All right.

00:37:58.770 --> 00:38:01.530
And so let's try
to figure it out.

00:38:01.530 --> 00:38:05.030
So right off the bat, and I hope
that people will help and call

00:38:05.030 --> 00:38:06.930
this out, because
I know we don't

00:38:06.930 --> 00:38:09.305
have enough time that I can
force you guys to all get it.

00:38:09.305 --> 00:38:11.013
But right off the bat,
which one of these

00:38:11.013 --> 00:38:12.699
looks like it's the easiest one?

00:38:12.699 --> 00:38:13.240
STUDENT: Six.

00:38:13.240 --> 00:38:13.865
PROFESSOR: Six.

00:38:13.865 --> 00:38:14.530
That's great.

00:38:14.530 --> 00:38:15.946
Six is definitely
the easiest one.

00:38:15.946 --> 00:38:17.220
It's a single line.

00:38:17.220 --> 00:38:19.430
So this is just how I would
have solved this problem,

00:38:19.430 --> 00:38:20.530
is find the easiest one.

00:38:20.530 --> 00:38:23.010
Which of these is
the crappiest net?

00:38:23.010 --> 00:38:23.510
STUDENT: A.

00:38:23.510 --> 00:38:25.500
PROFESSOR: A is
the crappiest net.

00:38:25.500 --> 00:38:27.440
But there's no
way in hell that A

00:38:27.440 --> 00:38:30.510
is going to be able to get
any of these except for six.

00:38:30.510 --> 00:38:39.240
So let's, right off the bat,
say that six is A. All right.

00:38:39.240 --> 00:38:44.550
Six is A. That's A. We don't
have to worry about A. OK.

00:38:44.550 --> 00:38:45.560
Cool.

00:38:45.560 --> 00:38:49.400
Now let's look at some other
ones that are very interesting.

00:38:49.400 --> 00:38:52.430
All the rest of
these draw two lines,

00:38:52.430 --> 00:38:53.835
well these three draw two lines.

00:38:53.835 --> 00:38:55.380
These three draw three lines.

00:38:55.380 --> 00:38:58.800
They draw a triangle.

00:38:58.800 --> 00:39:02.740
So despite the fact that this
c is a very powerful node,

00:39:02.740 --> 00:39:09.470
that indeed, with three whole
levels here of sigmoids,

00:39:09.470 --> 00:39:12.400
it looks like there's
only two that's

00:39:12.400 --> 00:39:14.490
in our little stable of
nets that are equipped

00:39:14.490 --> 00:39:16.460
to handle number one and two.

00:39:16.460 --> 00:39:18.530
And those are?

00:39:18.530 --> 00:39:23.335
E and F, because E and F have
three nodes at the first level.

00:39:23.335 --> 00:39:25.276
They can draw three lines.

00:39:25.276 --> 00:39:27.650
And then they can do something
logical about those lines,

00:39:27.650 --> 00:39:31.690
like for instance, maybe, if
it's inside all of those lines.

00:39:31.690 --> 00:39:32.960
There's a way to do that.

00:39:32.960 --> 00:39:36.260
You just-- basically you can
give negative and positive

00:39:36.260 --> 00:39:38.120
weights as you so
choose to make sure

00:39:38.120 --> 00:39:40.264
that it's under certain
ones, above other ones,

00:39:40.264 --> 00:39:42.930
and then make the threshold such
that it has to follow all three

00:39:42.930 --> 00:39:45.160
of your rules.

00:39:45.160 --> 00:39:49.725
So between E and F,
which one should be two

00:39:49.725 --> 00:39:52.050
and which one should be one.

00:39:52.050 --> 00:39:53.160
Anyone see?

00:39:53.160 --> 00:39:55.140
Well let's look at two and one.

00:39:55.140 --> 00:39:56.790
Which one is easier to do?

00:39:56.790 --> 00:39:57.880
Between two and one.

00:39:57.880 --> 00:39:59.120
Two.

00:39:59.120 --> 00:40:00.820
It's got a horizontal
and a vertical.

00:40:00.820 --> 00:40:03.350
One has all three diagonal.

00:40:03.350 --> 00:40:08.262
And which one of these is a
weaker net, between E and F.

00:40:08.262 --> 00:40:10.620
F. F has one node that
can only do a horizontal,

00:40:10.620 --> 00:40:13.630
and one node that can
only do a vertical line.

00:40:13.630 --> 00:40:16.370
So which one is F
going to have to do?

00:40:16.370 --> 00:40:17.850
Two.

00:40:17.850 --> 00:40:19.210
And E does what?

00:40:19.210 --> 00:40:20.530
Good job, guys.

00:40:20.530 --> 00:40:22.470
Good job, you got this.

00:40:22.470 --> 00:40:24.940
So now let's look
at the last three.

00:40:31.070 --> 00:40:32.790
Number three is
definitely the hardest.

00:40:32.790 --> 00:40:35.470
It's an exceller.

00:40:35.470 --> 00:40:37.260
Those of you who've
played around

00:40:37.260 --> 00:40:41.750
with double o 2 kind of
stuff, or even just logic,

00:40:41.750 --> 00:40:45.560
probably know that
there is no way

00:40:45.560 --> 00:40:52.490
to make a sort of simple
linear combination in one

00:40:52.490 --> 00:40:55.610
level of logic to
create an x or.

00:40:55.610 --> 00:40:58.660
x or is very
difficult to create.

00:40:58.660 --> 00:41:00.830
There are some
interesting problems

00:41:00.830 --> 00:41:04.560
involving trying to teach
an exceller to a neural net.

00:41:04.560 --> 00:41:06.280
Because a neural
net is not to be

00:41:06.280 --> 00:41:09.720
able to get the x or, because of
the fact that you can tell it,

00:41:09.720 --> 00:41:14.120
OK, I want this one to be
high, and this one to be low.

00:41:14.120 --> 00:41:14.930
That's fine.

00:41:14.930 --> 00:41:16.610
You say these both
have to be high.

00:41:16.610 --> 00:41:17.620
That's fine.

00:41:17.620 --> 00:41:21.110
It's hard to say, it's pretty
much impossible to say,

00:41:21.110 --> 00:41:24.860
this one or this one, but
not the other, because need

00:41:24.860 --> 00:41:27.460
to be high in a single node,
because of the fact that if you

00:41:27.460 --> 00:41:29.460
just play with it, you'll see.

00:41:29.460 --> 00:41:31.470
You need to set a
threshold somewhere,

00:41:31.470 --> 00:41:33.680
and it's not going to
be able to distinguish

00:41:33.680 --> 00:41:36.325
between, if the threshold
is set such that the

00:41:36.325 --> 00:41:38.450
or is going to work, the
whole or is going to work.

00:41:38.450 --> 00:41:42.060
It's going to accept when both
of them are positive as well.

00:41:42.060 --> 00:41:43.360
So how we can do x or?

00:41:43.360 --> 00:41:44.730
We need more logic.

00:41:44.730 --> 00:41:46.990
We need to use some
combinations of ands and ors

00:41:46.990 --> 00:41:48.480
in a two level way.

00:41:48.480 --> 00:41:51.520
To do that we need the deepest
neural net that we have.

00:41:51.520 --> 00:41:53.390
There's only one
that's capable of that.

00:41:53.390 --> 00:41:54.680
And that is?

00:41:54.680 --> 00:41:55.785
It's C.

00:41:55.785 --> 00:41:57.410
There are many
different ways to do it.

00:41:57.410 --> 00:41:59.510
Let's think of a possibility.

00:41:59.510 --> 00:42:04.230
i 1 and i 2 draw
these two lines.

00:42:04.230 --> 00:42:08.450
Let's call these one,
two, three, four, five,

00:42:08.450 --> 00:42:12.010
node 1 and node 2
draw these two lines.

00:42:12.010 --> 00:42:14.680
And I'll just sort of
draw it here for you guys.

00:42:14.680 --> 00:42:20.050
Then maybe node 3
gives value to-- yeah,

00:42:20.050 --> 00:42:29.810
let me see-- node three can
give value to perhaps-- let's

00:42:29.810 --> 00:42:36.924
see-- node 3 can give value to
everything that is-- there are

00:42:36.924 --> 00:42:38.090
a lot of possibilities here.

00:42:38.090 --> 00:42:48.410
Node 3 can give value to
everything that is up here.

00:42:48.410 --> 00:42:50.650
Actually node 3 can
give value to everything

00:42:50.650 --> 00:42:56.630
except for this
bottom part, and then

00:42:56.630 --> 00:43:05.510
node 4 could give value to
say-- doesn't do it yet,

00:43:05.510 --> 00:43:07.700
but there's a few-- there's
a few different ways

00:43:07.700 --> 00:43:09.130
to do it if you played around.

00:43:09.130 --> 00:43:12.470
The key idea is that
node 3 and node 4

00:43:12.470 --> 00:43:17.350
can give value to some
combination and or or not,

00:43:17.350 --> 00:43:23.490
and then node 5 can give value
based on being above or below

00:43:23.490 --> 00:43:26.280
a certain threshold,
combination of 3 and 4.

00:43:26.280 --> 00:43:29.200
You can build an exceller
out of the logic gates.

00:43:29.200 --> 00:43:32.820
I will ponder on that in the
back burner for a moment,

00:43:32.820 --> 00:43:35.200
as we continue
onward, but clearly C

00:43:35.200 --> 00:43:38.010
has to do number three.

00:43:38.010 --> 00:43:38.620
OK.

00:43:38.620 --> 00:43:40.490
Now we're left
with four and five.

00:43:40.490 --> 00:43:42.447
I think, interestingly,
five looks

00:43:42.447 --> 00:43:45.030
like it may be more complicated
than four, because of the fact

00:43:45.030 --> 00:43:48.297
that it needs to do both
different directions instead

00:43:48.297 --> 00:43:49.570
of two of the same direction.

00:43:52.930 --> 00:43:55.720
So however, just the idea of
the one with the fewer lines,

00:43:55.720 --> 00:43:58.720
being a simpler one, may
not get us through here.

00:43:58.720 --> 00:43:59.860
And there's a reason why.

00:43:59.860 --> 00:44:01.180
Look what we have left to use.

00:44:01.180 --> 00:44:06.610
We have to use D or B. What is
the property of the two lines

00:44:06.610 --> 00:44:08.740
that D can draw?

00:44:08.740 --> 00:44:11.930
D being the simpler one.

00:44:11.930 --> 00:44:14.760
One horizontal, one
vertical, that's right.

00:44:14.760 --> 00:44:16.260
So even though it
may look simpler

00:44:16.260 --> 00:44:17.740
to just have two
horizontal lines,

00:44:17.740 --> 00:44:20.860
it actually requires B.
B is the only one that

00:44:20.860 --> 00:44:24.030
can draw two horizontal lines
because D has to draw one

00:44:24.030 --> 00:44:25.660
horizontal and one vertical.

00:44:25.660 --> 00:44:33.160
So that leaves us with,
B on this, D on this.

00:44:33.160 --> 00:44:34.410
Excellent, we have a question.

00:44:34.410 --> 00:44:36.327
I would've thought it
would have been possible

00:44:36.327 --> 00:44:37.743
that we had no
questions, or maybe

00:44:37.743 --> 00:44:39.471
I just explained it
the best I ever have.

00:44:39.471 --> 00:44:39.970
Question.

00:44:39.970 --> 00:44:43.825
STUDENT: I didn't get why B
has to be two horizontal lines.

00:44:43.825 --> 00:44:44.700
PROFESSOR: All right.

00:44:44.700 --> 00:44:46.450
So the question is, I
don't understand why

00:44:46.450 --> 00:44:48.200
B to be two horizontal lines.

00:44:48.200 --> 00:44:52.340
The answer is, it doesn't.

00:44:52.340 --> 00:44:56.510
B can be anything, but D
can't be two horizontal lines.

00:44:56.510 --> 00:44:58.180
And so by process
of elimination,

00:44:58.180 --> 00:45:03.680
it's B. Well take
a look at D, right.

00:45:03.680 --> 00:45:09.530
So D has three nodes,
one, two, three.

00:45:09.530 --> 00:45:12.660
Node 1 and node 2 can
just draw a line anywhere

00:45:12.660 --> 00:45:15.140
they want, involving
the inputs they receive.

00:45:15.140 --> 00:45:18.470
What input does node 1 receive?

00:45:18.470 --> 00:45:19.280
Let's go to node 1.

00:45:21.910 --> 00:45:26.550
So it can only make a
cut off based on i 1.

00:45:26.550 --> 00:45:30.800
So therefore, it can only
draw by making the cut off

00:45:30.800 --> 00:45:32.200
above and below a certain point.

00:45:32.200 --> 00:45:34.990
Node 1 can only
draw vertical lines.

00:45:34.990 --> 00:45:37.174
Node 2 can only draw
a horizontal line,

00:45:37.174 --> 00:45:38.590
because it can
only make a cut off

00:45:38.590 --> 00:45:41.780
based on where it is an i 2.

00:45:41.780 --> 00:45:44.390
Therefore they can't
both draw a horizontal.

00:45:44.390 --> 00:45:46.640
That's why this is
the trickiest part.

00:45:46.640 --> 00:45:49.160
This last part, because
B is more powerful.

00:45:49.160 --> 00:45:51.340
B does not only have to
do two horizontal lines.

00:45:51.340 --> 00:45:54.280
It can do two diagonal lines.

00:45:54.280 --> 00:45:55.490
It can do anything it wants.

00:45:55.490 --> 00:45:58.222
It just happens that it's stuck
doing this somewhat easier

00:45:58.222 --> 00:46:00.680
problem, because the fact that
it is the only one left that

00:46:00.680 --> 00:46:02.490
has the power to do it.

00:46:02.490 --> 00:46:05.760
So let's see, we're
done, and we'd

00:46:05.760 --> 00:46:09.490
have aced this part of the
quiz that like no one got,

00:46:09.490 --> 00:46:11.240
well not no one, but
very few people got,

00:46:11.240 --> 00:46:13.660
when we put it on in 2008.

00:46:13.660 --> 00:46:17.140
The only thing we
have left to ask

00:46:17.140 --> 00:46:22.430
is-- let me see-- yeah, the
only thing we have left to ask

00:46:22.430 --> 00:46:29.931
is what are we going
to do here for this?

00:46:29.931 --> 00:46:30.430
All right.

00:46:30.430 --> 00:46:31.070
Let's see.

00:46:33.860 --> 00:46:38.360
For the x or, let's see
if I can do this x or.

00:46:41.360 --> 00:46:43.240
OK.

00:46:43.240 --> 00:46:44.970
How about this one.

00:46:44.970 --> 00:46:45.510
Right.

00:46:45.510 --> 00:46:46.180
I'm an idiot.

00:46:46.180 --> 00:46:47.910
This is the easiest way.

00:46:47.910 --> 00:46:49.926
Number one draws this line.

00:46:49.926 --> 00:46:51.050
Number two draws this line.

00:46:51.050 --> 00:46:55.307
Number three ends the
line, the two lines.

00:46:55.307 --> 00:46:56.890
Number three says
only if both of them

00:46:56.890 --> 00:46:58.360
are true, will I accept.

00:46:58.360 --> 00:47:02.510
Number four maps the two lines.

00:47:02.510 --> 00:47:05.550
And number five ors
between three and four.

00:47:08.430 --> 00:47:09.740
Thank you.

00:47:09.740 --> 00:47:11.840
No, it's not that hard.

00:47:11.840 --> 00:47:13.902
I just completely
blanked, because there's

00:47:13.902 --> 00:47:15.860
another way that a lot
of people like to do it.

00:47:15.860 --> 00:47:17.300
It involves drawing
in a lot of lines,

00:47:17.300 --> 00:47:18.570
and then making the clef b 2.

00:47:18.570 --> 00:47:20.280
But I can't remember
it at the moment.

00:47:20.280 --> 00:47:21.610
Or there any other questions?

00:47:21.610 --> 00:47:26.958
Because I think if you
have a question now,

00:47:26.958 --> 00:47:28.582
like four other people
have it and just

00:47:28.582 --> 00:47:29.630
aren't raising their hand.

00:47:29.630 --> 00:47:31.470
So ask any questions
about this drawing thing.

00:47:31.470 --> 00:47:31.970
Question?

00:47:31.970 --> 00:47:33.569
STUDENT: Why do we do this?

00:47:33.569 --> 00:47:35.360
PROFESSOR: Why do we
do this drawing thing?

00:47:35.360 --> 00:47:37.680
That's a very good question.

00:47:37.680 --> 00:47:40.990
The answer is so that you
can see what kinds of nets

00:47:40.990 --> 00:47:43.680
you might need to use in
these simple problems,

00:47:43.680 --> 00:47:45.590
to answer these simple problems.

00:47:45.590 --> 00:47:51.320
So that if Athena
forbid that you

00:47:51.320 --> 00:47:54.340
have to use a
neural net in a job

00:47:54.340 --> 00:47:56.640
somewhere to do some
actual learning,

00:47:56.640 --> 00:48:00.260
and you see some sort of
quality about the problem,

00:48:00.260 --> 00:48:02.910
you know not to make a
net that's too simple,

00:48:02.910 --> 00:48:03.720
for instance.

00:48:03.720 --> 00:48:05.178
And you wouldn't
want a net that is

00:48:05.178 --> 00:48:06.660
more complex than it has to be.

00:48:06.660 --> 00:48:10.840
So you can sort of see what
the net's do at each level,

00:48:10.840 --> 00:48:13.140
and more visibly understand.

00:48:13.140 --> 00:48:15.970
I think a lot of people who
drew problems like this just

00:48:15.970 --> 00:48:17.595
want to make sure
people know, oh yeah,

00:48:17.595 --> 00:48:19.636
it's not just these numbers
that we're mindlessly

00:48:19.636 --> 00:48:21.760
backpropagating from the
other part of the problem

00:48:21.760 --> 00:48:23.720
to make them higher or lower.

00:48:23.720 --> 00:48:25.760
This is what we're
doing at each level.

00:48:25.760 --> 00:48:29.280
This is the space
that we're looking at.

00:48:29.280 --> 00:48:32.780
Each node is performing
logic on the steps before.

00:48:32.780 --> 00:48:36.680
So that if you actually have to
use a neural net later on, down

00:48:36.680 --> 00:48:41.409
the road, then you'll
be able to figure out

00:48:41.409 --> 00:48:43.200
what your net's going
to need to look like.

00:48:43.200 --> 00:48:45.530
You'll be able to figure
out what it's doing.

00:48:45.530 --> 00:48:47.174
At least as well as
you can figure out

00:48:47.174 --> 00:48:48.590
what it's doing,
for a neural net,

00:48:48.590 --> 00:48:50.950
since it often will
start getting up

00:48:50.950 --> 00:48:54.070
these really crazy numbers, will
have all sorts of nodes in it,

00:48:54.070 --> 00:48:56.981
and like a real neural net
that's being used nowadays,

00:48:56.981 --> 00:48:58.730
there'll be tons of
nodes, and you'll just

00:48:58.730 --> 00:49:00.104
see the numbers
fluctuate wildly,

00:49:00.104 --> 00:49:04.402
and then suddenly it's going
to start working or not.

00:49:04.402 --> 00:49:05.360
That's a good question.

00:49:05.360 --> 00:49:06.193
Any other questions?

00:49:06.193 --> 00:49:08.220
We still have a few minutes.

00:49:08.220 --> 00:49:09.340
Not many, but a few.

00:49:09.340 --> 00:49:11.564
Any other questions
about any of this stuff?

00:49:11.564 --> 00:49:12.064
Sorry.

00:49:12.064 --> 00:49:12.556
STUDENT: Talk about
what you just asked.

00:49:12.556 --> 00:49:15.016
Just because we draw it, does
the machine need to learn--

00:49:20.929 --> 00:49:23.220
PROFESSOR: You're confused
why the machine is run what,

00:49:23.220 --> 00:49:26.140
by the pictures on the right?

00:49:26.140 --> 00:49:27.137
Oh OK.

00:49:27.137 --> 00:49:29.720
Machine does not have to learn
by drawing pictures and calling

00:49:29.720 --> 00:49:30.676
them in.

00:49:30.676 --> 00:49:32.300
Let me give you some
real applications.

00:49:32.300 --> 00:49:35.110
My friend at the
University of Maryland

00:49:35.110 --> 00:49:38.300
recently actually
used neural nets

00:49:38.300 --> 00:49:41.640
because, yeah, he actually did,
because of the fact that he

00:49:41.640 --> 00:49:45.410
was doing an game plan
competition, where

00:49:45.410 --> 00:49:48.795
the game was not known when
you were designing your AI.

00:49:48.795 --> 00:49:52.490
It had to be able to-- there was
some very elegant, general game

00:49:52.490 --> 00:49:54.950
solver thing that you had
be able to hook up into,

00:49:54.950 --> 00:49:58.140
and then they made up the
rules, and you had a little bit

00:49:58.140 --> 00:49:59.570
of time, and then it started.

00:49:59.570 --> 00:50:03.320
Some of the AI's, what
they did was, they trained,

00:50:03.320 --> 00:50:05.780
once they found out what
the rules were on their own,

00:50:05.780 --> 00:50:08.450
with the rules, in his case he
had a neural net, because it

00:50:08.450 --> 00:50:11.660
was so generic, you just
have a web of random gook.

00:50:11.660 --> 00:50:13.970
He thought it could
learn anything,

00:50:13.970 --> 00:50:16.910
and then-- he never did tell
me how it went, probably

00:50:16.910 --> 00:50:18.090
didn't go well.

00:50:18.090 --> 00:50:20.700
But maybe it did.

00:50:20.700 --> 00:50:25.119
It basically tried to learn
some things about the rules.

00:50:25.119 --> 00:50:26.660
Some of the other
people who are more

00:50:26.660 --> 00:50:29.930
principled game players
actually tried to find out

00:50:29.930 --> 00:50:32.950
fundamental properties
of the space of the rules

00:50:32.950 --> 00:50:34.820
by testing a few
different things,

00:50:34.820 --> 00:50:36.640
so they could view
more knowledge

00:50:36.640 --> 00:50:39.110
is less search so they
could do less search

00:50:39.110 --> 00:50:41.180
when the actual game
playing came on.

00:50:41.180 --> 00:50:43.390
And then when the actual
game playing came on,

00:50:43.390 --> 00:50:48.680
pretty much everyone did some
kind of game tree based stuff.

00:50:48.680 --> 00:50:51.640
He's telling me that
a lot of Monte Carlo

00:50:51.640 --> 00:50:55.340
based game tree stuff that is
this very non deterministic

00:50:55.340 --> 00:50:57.349
as what they're doing
nowadays, rather than

00:50:57.349 --> 00:50:59.140
what determines the
alpha beta, although he

00:50:59.140 --> 00:51:02.380
said it converges to alpha beta,
if you've given enough time.

00:51:02.380 --> 00:51:05.050
That's what he told
me, But that someone I

00:51:05.050 --> 00:51:06.570
know who is using neural nets.

00:51:06.570 --> 00:51:08.800
I've also in a cognitive
science class I took,

00:51:08.800 --> 00:51:11.720
saw neural nets that tried
to attach like qualities

00:51:11.720 --> 00:51:15.620
to objects, by having just
this huge, huge number of nodes

00:51:15.620 --> 00:51:18.160
in levels in between, and
then eventually it was like,

00:51:18.160 --> 00:51:20.780
a duck flies, and you're like,
how's it doing this again?

00:51:20.780 --> 00:51:22.640
I'm not sure, but it is.

00:51:22.640 --> 00:51:25.890
So the basic idea
is that when-- one

00:51:25.890 --> 00:51:28.140
of the main reasons that
neural nets were used so much

00:51:28.140 --> 00:51:31.340
back in the day is that
people on many different sides

00:51:31.340 --> 00:51:33.450
of this problem,
cognitive science, AI,

00:51:33.450 --> 00:51:35.620
whatever, were all
saying, wait a minute,

00:51:35.620 --> 00:51:38.610
there's networks of neurons,
and they can do stuff,

00:51:38.610 --> 00:51:40.360
and we're seeing it
in different places.

00:51:40.360 --> 00:51:42.860
And when you've seen it in so
many different places at once,

00:51:42.860 --> 00:51:44.490
must be a genius
idea that's going

00:51:44.490 --> 00:51:46.140
to revolutionize everything.

00:51:46.140 --> 00:51:47.780
And so then everyone
started using

00:51:47.780 --> 00:51:50.860
them to try to connect all these
things together, which I think

00:51:50.860 --> 00:51:53.340
is a noble endeavor,
but unfortunately people

00:51:53.340 --> 00:51:54.300
just stopped using it.

00:51:54.300 --> 00:51:56.180
It didn't work as they wanted.

00:51:56.180 --> 00:51:58.950
It turned out that figuring out
our neurons worked in our head

00:51:58.950 --> 00:52:03.860
was not the way to solve all
AI hard problems at once.

00:52:03.860 --> 00:52:06.360
And they fall into
disfavor, although are still

00:52:06.360 --> 00:52:09.510
used for some reasons,
like the sum is like that.

00:52:09.510 --> 00:52:11.610
So we wouldn't use it just
to draw these pictures.

00:52:11.610 --> 00:52:13.492
The reason why we
have these pictures

00:52:13.492 --> 00:52:15.950
is because we give you simple
nets that you can work it out

00:52:15.950 --> 00:52:17.720
by hand on the quiz.

00:52:17.720 --> 00:52:20.340
Any net that is
really used nowadays

00:52:20.340 --> 00:52:23.700
would make your
head explode, if we

00:52:23.700 --> 00:52:26.140
tried to make you do
something with it on the quiz.

00:52:26.140 --> 00:52:27.660
It would just be horrible.

00:52:27.660 --> 00:52:29.315
So I think that's
a good question.

00:52:29.315 --> 00:52:31.440
If there's no other questions,
or even if they are,

00:52:31.440 --> 00:52:34.390
because we have to head out,
if there's any other questions,

00:52:34.390 --> 00:52:37.360
you can see me as
I'm walking out.