WEBVTT

00:00:00.050 --> 00:00:01.770
The following
content is provided

00:00:01.770 --> 00:00:04.010
under a Creative
Commons license.

00:00:04.010 --> 00:00:06.860
Your support will help MIT
OpenCourseWare continue

00:00:06.860 --> 00:00:10.720
to offer high quality
educational resources for free.

00:00:10.720 --> 00:00:13.320
To make a donation or
view additional materials

00:00:13.320 --> 00:00:17.207
from hundreds of MIT courses,
visit MIT OpenCourseWare

00:00:17.207 --> 00:00:17.832
at ocw.mit.edu.

00:00:20.552 --> 00:00:22.010
PROFESSOR SRINI
DEVADAS: Erik and I

00:00:22.010 --> 00:00:24.900
have been tag teaming
this lecture in this class

00:00:24.900 --> 00:00:28.250
so we're going to
split this lecture.

00:00:28.250 --> 00:00:33.110
So I get to do the
first 2 minutes.

00:00:33.110 --> 00:00:33.950
No.

00:00:33.950 --> 00:00:38.520
I get to do the first
20 minutes, or so,

00:00:38.520 --> 00:00:40.670
talking about some
of my research

00:00:40.670 --> 00:00:42.690
in parallel architecture.

00:00:42.690 --> 00:00:45.040
And Erik's going to talk
about a bunch of things

00:00:45.040 --> 00:00:48.770
that he's been up to over
the years in Algorithm Design

00:00:48.770 --> 00:00:50.470
and Analysis.

00:00:50.470 --> 00:00:51.990
So let's get started.

00:00:56.060 --> 00:00:58.640
When was the first
PC built, anybody?

00:01:01.400 --> 00:01:01.900
Yeah.

00:01:01.900 --> 00:01:03.180
AUDIENCE: In the 1950s.

00:01:03.180 --> 00:01:04.346
PROFESSOR SRINI DEVADAS: No.

00:01:04.346 --> 00:01:08.600
The first personal computer was
1981-- not the first computer.

00:01:08.600 --> 00:01:14.920
So all of you know about
Intel, and Microsoft, and IBM,

00:01:14.920 --> 00:01:15.500
and so on.

00:01:18.060 --> 00:01:23.430
Intel's gift to humankind
is the x86 architecture.

00:01:23.430 --> 00:01:26.830
Though, some people
would argue that point.

00:01:26.830 --> 00:01:32.110
And the x86 architecture
was invented in 1981,

00:01:32.110 --> 00:01:38.290
and was part of the first PC--
that provided the horsepower

00:01:38.290 --> 00:01:41.270
for the first PC-- the IBM PC.

00:01:41.270 --> 00:01:43.015
And it ran at 5 megahertz.

00:01:48.930 --> 00:01:53.210
And x86 has been around-- you
still can buy x86 computers.

00:01:53.210 --> 00:02:01.570
The 80486, in 1989,
ran at 25 megahertz.

00:02:01.570 --> 00:02:03.770
So you can see a trend here.

00:02:03.770 --> 00:02:07.060
And the 80486, as it
turns out, ended up

00:02:07.060 --> 00:02:10.900
being called the I486 because
there was a court ruling that

00:02:10.900 --> 00:02:15.300
said that you couldn't
trademark numbers.

00:02:15.300 --> 00:02:17.430
And so Intel, at
that point, decided

00:02:17.430 --> 00:02:19.190
to start naming
their processors.

00:02:21.870 --> 00:02:27.170
So the Pentium, which is
one of the more famous Intel

00:02:27.170 --> 00:02:31.720
processors, was built
and came out in 1993.

00:02:31.720 --> 00:02:35.240
And the clock speed
went up to 66 megahertz,

00:02:35.240 --> 00:02:37.960
back in the early '90s.

00:02:37.960 --> 00:02:42.010
And since this is
just such a cool name,

00:02:42.010 --> 00:02:45.170
Intel continued to call
its processors Pentium.

00:02:45.170 --> 00:02:52.855
And the Pentium 4, in 2000, had
this incredibly deep pipeline

00:02:52.855 --> 00:02:54.650
where you broke
up the computation

00:02:54.650 --> 00:02:55.650
into a bunch of stages.

00:02:55.650 --> 00:02:57.760
In fact, it had a
30 stage pipeline.

00:02:57.760 --> 00:03:02.580
And so the clock speed went up
all the way to 1.5 gigahertz.

00:03:02.580 --> 00:03:05.690
The Pentium was famous
for many things,

00:03:05.690 --> 00:03:10.010
including a couple of
bugs in the floating point

00:03:10.010 --> 00:03:15.540
pipeline where division,
in particular corner cases,

00:03:15.540 --> 00:03:17.190
wasn't done correctly.

00:03:17.190 --> 00:03:24.170
And there was also this bug
called the F00F bug, which

00:03:24.170 --> 00:03:28.960
allowed a malicious program
to crash the entire system,

00:03:28.960 --> 00:03:32.600
regardless of whether it had
administrative privileges

00:03:32.600 --> 00:03:34.280
or not.

00:03:34.280 --> 00:03:37.116
But the Pentium was
obviously very successful.

00:03:37.116 --> 00:03:40.020
A lot of machines sold.

00:03:40.020 --> 00:03:44.680
And it felt like it was only
going to be a matter of time

00:03:44.680 --> 00:03:47.180
before we got to
10s of gigahertz,

00:03:47.180 --> 00:03:48.380
the way things were going.

00:03:48.380 --> 00:03:51.270
As you can see, this is
a pretty steep growth

00:03:51.270 --> 00:03:54.070
from 5 megahertz to
25 to 1.5 gigahertz

00:03:54.070 --> 00:03:57.400
in the space of about 20 years.

00:03:57.400 --> 00:04:03.760
As it turns out, after the
Pentium D, which came out

00:04:03.760 --> 00:04:09.770
in 2005, where the clock speed
peaked at about 3.2 gigahertz,

00:04:09.770 --> 00:04:12.530
clock frequency
stopped increasing.

00:04:12.530 --> 00:04:17.880
And what you see
now are things that

00:04:17.880 --> 00:04:21.070
correspond to multiple
processors on a chip.

00:04:21.070 --> 00:04:26.110
So for example, the Quad
Core Xeon came out in 2008.

00:04:26.110 --> 00:04:27.670
You can still buy it.

00:04:27.670 --> 00:04:31.170
Only runs at 3 gigahertz,
which is basically

00:04:31.170 --> 00:04:34.240
about the same as
the Pentium D ran.

00:04:34.240 --> 00:04:37.100
Each of these has a
range of frequencies.

00:04:37.100 --> 00:04:41.440
And beyond about
2005, the clock speed

00:04:41.440 --> 00:04:43.540
of processors that
you can buy is kind of

00:04:43.540 --> 00:04:46.870
saturated at about 3 gigahertz.

00:04:46.870 --> 00:04:50.060
And the way you're
getting performance

00:04:50.060 --> 00:04:54.290
is by putting multiple
processors on the chip.

00:04:54.290 --> 00:04:59.570
And people use the term cores
synonymously with processors.

00:04:59.570 --> 00:05:02.170
So a quad core means
that they're, in effect,

00:05:02.170 --> 00:05:07.400
four x86 processors on the same
silicon integrated circuit.

00:05:07.400 --> 00:05:10.100
And they're
interconnected together.

00:05:10.100 --> 00:05:11.920
And they talk to memory.

00:05:11.920 --> 00:05:14.720
And you have, essentially,
a parallel processor

00:05:14.720 --> 00:05:16.980
on a single chip.

00:05:16.980 --> 00:05:20.280
And the single user, potentially
running many programs,

00:05:20.280 --> 00:05:22.350
is using this system.

00:05:22.350 --> 00:05:25.530
And you have dual core
processors on your laptops.

00:05:25.530 --> 00:05:29.140
And so the scale, now,
is-- the metric now,

00:05:29.140 --> 00:05:32.830
I should say-- is how many
cores do you have on a chip.

00:05:32.830 --> 00:05:34.330
And people are
predicting that we're

00:05:34.330 --> 00:05:38.560
going to have 1,000
cores by 2020, on a chip.

00:05:38.560 --> 00:05:43.450
So this brings us to the problem
of how do we use parallelism.

00:05:43.450 --> 00:05:46.000
So there's a lot of work
in parallel algorithms.

00:05:46.000 --> 00:05:48.770
And there's also work
in building hardware,

00:05:48.770 --> 00:05:51.720
such that algorithms can
sort of automatically

00:05:51.720 --> 00:05:54.210
be parallelized while
they're running in hardware,

00:05:54.210 --> 00:05:57.380
so they can run faster,
and so on and so forth.

00:05:57.380 --> 00:06:00.240
So some of my research is
in parallel architecture.

00:06:00.240 --> 00:06:02.090
Some of it is in
parallel algorithms.

00:06:02.090 --> 00:06:05.980
I want to give you a sense
of what the problems are

00:06:05.980 --> 00:06:08.670
in building parallel
architectures.

00:06:08.670 --> 00:06:13.370
And in particular, I'll start
with a canonical system that

00:06:13.370 --> 00:06:16.215
corresponds to, let's say,
this quad core system.

00:06:16.215 --> 00:06:23.730
And so you have 4 processors on
this single integrated circuit.

00:06:23.730 --> 00:06:25.950
So that signifies that.

00:06:25.950 --> 00:06:32.040
And typically, you have a lot
of fast, static random-access

00:06:32.040 --> 00:06:35.395
memory, SRAM, on the same chip.

00:06:35.395 --> 00:06:38.960
So typically,
megabytes of the memory

00:06:38.960 --> 00:06:45.756
on the chip and gigabytes
of memory in DRAM,

00:06:45.756 --> 00:06:47.630
which are separate
modules that are connected

00:06:47.630 --> 00:06:49.950
via high speed
bus, off the chip.

00:06:49.950 --> 00:06:53.810
So there are usually
many DRAM modules.

00:06:53.810 --> 00:06:58.390
They're called DIMMS-- if you
might have heard the term.

00:06:58.390 --> 00:07:01.670
So the connection between
the processors and the SRAM

00:07:01.670 --> 00:07:03.910
is typically very fast.

00:07:03.910 --> 00:07:05.910
It's on-chip.

00:07:05.910 --> 00:07:08.540
Things being clocked
at gigahertz.

00:07:08.540 --> 00:07:10.170
And when you go
off-chip, you're down

00:07:10.170 --> 00:07:11.350
to a few hundred megahertz.

00:07:11.350 --> 00:07:15.060
So typically, an order
of magnitude less speed.

00:07:15.060 --> 00:07:17.040
But you're accessing
much more memory.

00:07:17.040 --> 00:07:18.632
So this is really
gigabytes and this

00:07:18.632 --> 00:07:19.840
is at the level of megabytes.

00:07:22.515 --> 00:07:26.750
If you see this
picture, here-- if you

00:07:26.750 --> 00:07:28.720
think about the number
of processors increasing

00:07:28.720 --> 00:07:32.750
from four to eight to
16, all the way to,

00:07:32.750 --> 00:07:36.180
say, to hundreds
of processors, you

00:07:36.180 --> 00:07:38.850
can see that there's going
to be a bottleneck associated

00:07:38.850 --> 00:07:41.550
with accessing the memory.

00:07:41.550 --> 00:07:44.340
The big problem is
you can't possibly

00:07:44.340 --> 00:07:48.210
build memory that
serves hundreds

00:07:48.210 --> 00:07:49.940
of requests in parallel.

00:07:49.940 --> 00:07:53.780
If you try and make a large
SRAM, which is megabytes long,

00:07:53.780 --> 00:07:55.920
the number of
ports in the SRAM--

00:07:55.920 --> 00:08:01.760
read ports-- is roughly
of the order of four.

00:08:01.760 --> 00:08:04.070
And after that it's
kind of hard to build.

00:08:04.070 --> 00:08:12.220
So this architecture isn't going
to be sustainable beyond 4, 8,

00:08:12.220 --> 00:08:13.990
maybe 16 cores.

00:08:13.990 --> 00:08:17.120
So typically, what
people build is--

00:08:17.120 --> 00:08:19.790
or people are trying
to build in academia--

00:08:19.790 --> 00:08:23.980
is something that corresponds
to a distributed architecture

00:08:23.980 --> 00:08:32.320
on the chip, where you have
processors and memory in tiles.

00:08:32.320 --> 00:08:39.059
So you have,
essentially, something

00:08:39.059 --> 00:08:43.490
like this, where you can
imagine having literally 100

00:08:43.490 --> 00:08:50.657
processors on a chip that
correspond to an implementation

00:08:50.657 --> 00:08:52.990
where you build tiles, where
you have a processor that's

00:08:52.990 --> 00:08:56.950
doing the computation, and
you have memory-- sometimes

00:08:56.950 --> 00:08:58.400
called cache memory.

00:08:58.400 --> 00:09:02.690
But there's multiple levels
of caches, typically, that

00:09:02.690 --> 00:09:04.830
are attached to each
of these processors.

00:09:04.830 --> 00:09:11.400
And the space between
the processor tiles

00:09:11.400 --> 00:09:17.590
is reserved for
interconnect or for wires

00:09:17.590 --> 00:09:19.690
that connect these
processors up.

00:09:19.690 --> 00:09:23.270
And so there's research that
goes on in routing algorithms.

00:09:23.270 --> 00:09:25.790
How you figure out if
these processors want

00:09:25.790 --> 00:09:29.030
to talk to each other; what
the best way of routing

00:09:29.030 --> 00:09:31.810
the messages are; you want
to find the shortest path.

00:09:31.810 --> 00:09:33.520
In this case, the
weight corresponds

00:09:33.520 --> 00:09:36.800
to the congestion
that's associated

00:09:36.800 --> 00:09:39.570
with each of these
channels that you have.

00:09:39.570 --> 00:09:41.990
And people actually
use algorithms

00:09:41.990 --> 00:09:44.880
like weighted shortest
paths, in hardware,

00:09:44.880 --> 00:09:47.690
to determine what the best way
of getting from here to there

00:09:47.690 --> 00:09:48.190
is.

00:09:48.190 --> 00:09:49.420
It may not be this way.

00:09:49.420 --> 00:09:51.800
It may be going
around the chip simply

00:09:51.800 --> 00:09:55.195
because that path-- the
latter one is less congested.

00:09:57.960 --> 00:10:00.620
The other issue
that comes up has

00:10:00.620 --> 00:10:05.410
to do with how long it
takes to go across the chip

00:10:05.410 --> 00:10:06.560
and come back.

00:10:06.560 --> 00:10:09.840
So if this processor wants
to access its local memory--

00:10:09.840 --> 00:10:14.710
that's typically
pretty simple or fast.

00:10:14.710 --> 00:10:18.226
But if it wants to
access remote memory--

00:10:18.226 --> 00:10:19.600
and it's quite
possible that it's

00:10:19.600 --> 00:10:22.290
sharing some data with a
different thread running

00:10:22.290 --> 00:10:23.845
on a different processor.

00:10:23.845 --> 00:10:25.470
So typically, there's
a program running

00:10:25.470 --> 00:10:29.210
on this processor,
sometimes called a thread,

00:10:29.210 --> 00:10:34.480
and this program may share data
with a different program, which

00:10:34.480 --> 00:10:36.310
is running on this processor.

00:10:36.310 --> 00:10:38.940
Or it may just require
a lot more space.

00:10:38.940 --> 00:10:43.080
And what this program has
to do is make a request

00:10:43.080 --> 00:10:47.820
all the way to this processor
and this particular cache

00:10:47.820 --> 00:10:48.860
in this processor.

00:10:48.860 --> 00:10:52.080
And then it gets the data back.

00:10:52.080 --> 00:10:57.990
So what you see here
is a round trip access

00:10:57.990 --> 00:11:01.620
that goes across the chip.

00:11:01.620 --> 00:11:05.510
And this distance,
if it's large,

00:11:05.510 --> 00:11:07.550
could take 10s of cycles.

00:11:07.550 --> 00:11:09.030
So typically, it's
a single cycle

00:11:09.030 --> 00:11:11.990
to access local memory--
the fastest local memory,

00:11:11.990 --> 00:11:13.380
called the L1 cache.

00:11:13.380 --> 00:11:15.680
But it could take 10s of
cycles to go send a message

00:11:15.680 --> 00:11:18.970
across the chip and 10s of
cycles to get the data back.

00:11:18.970 --> 00:11:23.480
So the bottleneck, really,
in parallel processing

00:11:23.480 --> 00:11:25.570
from a standpoint
of communication

00:11:25.570 --> 00:11:31.029
is this routing of messages
and getting the messages back.

00:11:31.029 --> 00:11:33.070
One of the things that my
research group is doing

00:11:33.070 --> 00:11:37.460
is looking at the
notion of migrating

00:11:37.460 --> 00:11:40.200
computation as opposed to data.

00:11:40.200 --> 00:11:45.620
We call it execution
migration, where

00:11:45.620 --> 00:11:52.960
you could say-- suppose I
have a processor running

00:11:52.960 --> 00:11:55.460
a particular program, out here.

00:11:55.460 --> 00:12:00.620
And if this program wanted
to access a remote memory,

00:12:00.620 --> 00:12:02.220
then, rather than
doing what I just

00:12:02.220 --> 00:12:03.650
showed you there--
send a message,

00:12:03.650 --> 00:12:05.680
get the data back--
you could imagine

00:12:05.680 --> 00:12:10.420
that you could migrate
the program itself.

00:12:10.420 --> 00:12:12.070
And in particular,
you think of it

00:12:12.070 --> 00:12:16.830
as migrating the
context of the program

00:12:16.830 --> 00:12:20.580
from this processor to this one.

00:12:20.580 --> 00:12:21.800
And so what is the context?

00:12:21.800 --> 00:12:27.540
For those of you who
have taken 6.004 probably

00:12:27.540 --> 00:12:28.360
know what this is.

00:12:28.360 --> 00:12:33.630
But it's simply where
you are in terms

00:12:33.630 --> 00:12:34.825
of executing your program.

00:12:34.825 --> 00:12:36.200
And that's typically
given to you

00:12:36.200 --> 00:12:41.340
by our program counter, and your
current state of your register

00:12:41.340 --> 00:12:46.740
file, and a few other
things, including

00:12:46.740 --> 00:12:49.180
cache memory and
so on and so forth.

00:12:49.180 --> 00:12:52.230
So the advantage with
execution migration

00:12:52.230 --> 00:12:56.160
is that it's a one way trip,
as opposed to a round trip.

00:13:00.510 --> 00:13:05.620
You don't have to send a
message and get the data back,

00:13:05.620 --> 00:13:08.707
which would be two
messages, if you will--

00:13:08.707 --> 00:13:10.540
one in the case of the
address and the other

00:13:10.540 --> 00:13:13.780
for the data-- but you
migrate your execution.

00:13:13.780 --> 00:13:15.660
Since you have
computation out here,

00:13:15.660 --> 00:13:20.570
you can run on this
remote processor.

00:13:20.570 --> 00:13:23.260
So that's one of the advantages
of execution migration

00:13:23.260 --> 00:13:27.860
One of the downsides
of it is that this

00:13:27.860 --> 00:13:31.200
can be multiple
kilobytes-- or kilobits.

00:13:31.200 --> 00:13:35.750
And it could be significantly
more in terms of size,

00:13:35.750 --> 00:13:39.460
or in terms of bits, than the
data that you want to access.

00:13:39.460 --> 00:13:41.180
So there's a trade-off here.

00:13:41.180 --> 00:13:43.580
And then, when any time
you have a trade-off,

00:13:43.580 --> 00:13:45.540
you can think of an
algorithm to try and find

00:13:45.540 --> 00:13:47.100
the optimal trade-off.

00:13:47.100 --> 00:13:54.650
So this is the context for the
particular optimization problem

00:13:54.650 --> 00:13:57.280
that we need to solve,
here, that corresponds

00:13:57.280 --> 00:14:03.230
to really deciding when you
want to do data migration

00:14:03.230 --> 00:14:06.960
and when you want to
do execution migration.

00:14:06.960 --> 00:14:08.870
There's a choice.

00:14:08.870 --> 00:14:13.510
At the top level, it's a
round trip to get the data.

00:14:13.510 --> 00:14:18.670
So you're really traveling
longer-- twice as long.

00:14:18.670 --> 00:14:20.790
The distance is twice as much.

00:14:20.790 --> 00:14:23.590
But it's possible
that the amount

00:14:23.590 --> 00:14:26.580
of state that
you'd have to move,

00:14:26.580 --> 00:14:29.790
in terms of taking your
context of your thread

00:14:29.790 --> 00:14:32.710
and moving across the
chip, could be large enough

00:14:32.710 --> 00:14:38.020
that it offsets the advantage
of the shorter distance.

00:14:38.020 --> 00:14:42.820
So we set this up as an
optimization problem.

00:14:42.820 --> 00:14:46.110
So now we're in the realm of--
we moved from 6.004 to 6.006,

00:14:46.110 --> 00:14:50.210
here, in the last
couple of seconds.

00:14:50.210 --> 00:14:57.990
So assume we know or
can predict the access

00:14:57.990 --> 00:15:04.720
pattern of a program.

00:15:04.720 --> 00:15:06.320
And you can do
this-- people build

00:15:06.320 --> 00:15:09.150
these things in hardware--
prefetch engines,

00:15:09.150 --> 00:15:11.100
branch predictors, and so on.

00:15:11.100 --> 00:15:13.241
They're in the x86 machines.

00:15:13.241 --> 00:15:15.740
And you can tell-- especially
if you're going through a loop

00:15:15.740 --> 00:15:20.057
over and over-- you can
make this prediction.

00:15:20.057 --> 00:15:21.640
So you have some
amount of look ahead.

00:15:21.640 --> 00:15:25.000
And you know that
m1 through mn are

00:15:25.000 --> 00:15:29.250
the memory accesses that this
program is going to make.

00:15:29.250 --> 00:15:30.725
And these other
memory addresses.

00:15:34.930 --> 00:15:42.670
And I'm going to think about
p of m1, p of m2, p of mn,

00:15:42.670 --> 00:15:52.400
as the processor
caches for each mi.

00:15:52.400 --> 00:15:57.330
So what might be the
case, in a simple example,

00:15:57.330 --> 00:16:02.731
is you want to access
memory in processor one.

00:16:02.731 --> 00:16:04.106
You're sitting
there and you want

00:16:04.106 --> 00:16:06.270
to access memory
in processor one.

00:16:06.270 --> 00:16:08.370
And then, the next one,
you want to access memory

00:16:08.370 --> 00:16:10.090
in processor two.

00:16:10.090 --> 00:16:13.379
And so on and so forth.

00:16:13.379 --> 00:16:14.920
So you might see
something like that.

00:16:14.920 --> 00:16:17.410
So the sequence of
memory addressees--

00:16:17.410 --> 00:16:20.207
if you're sitting on processor
one-- this first one is local.

00:16:20.207 --> 00:16:22.540
And then, after that, you
want to access processor two's

00:16:22.540 --> 00:16:24.790
memory because you're
sharing data with it.

00:16:24.790 --> 00:16:27.312
Then you're back home,
again, to processor one.

00:16:27.312 --> 00:16:28.270
And so on and so forth.

00:16:31.370 --> 00:16:34.610
So that's one
example of a set up.

00:16:34.610 --> 00:16:39.350
And we can think of about
the cost of migration

00:16:39.350 --> 00:16:41.874
as-- if you want
to go from s to d--

00:16:41.874 --> 00:16:45.400
as being a function
of the distance,

00:16:45.400 --> 00:16:49.590
s comma d, plus
some constant, which

00:16:49.590 --> 00:16:53.987
is proportional to
the context size.

00:16:53.987 --> 00:16:55.820
And that context size,
we're going to assume

00:16:55.820 --> 00:16:59.070
is fixed for a
particular architecture.

00:16:59.070 --> 00:17:00.820
It may change for
different architectures,

00:17:00.820 --> 00:17:04.511
but if it's a few
kilobits, then there's

00:17:04.511 --> 00:17:06.010
going to be some
overhead associated

00:17:06.010 --> 00:17:08.160
with putting the context
onto the network.

00:17:08.160 --> 00:17:12.000
And it's a sizable overhead that
needs to be taken into account.

00:17:12.000 --> 00:17:13.940
That's the cost of migration.

00:17:13.940 --> 00:17:18.210
The cost of an
access, s comma d,

00:17:18.210 --> 00:17:23.050
is twice the distance
between s and d.

00:17:23.050 --> 00:17:25.800
And it's typically
just a word that you

00:17:25.800 --> 00:17:29.470
want to access--
32 bits, 64 bits--

00:17:29.470 --> 00:17:33.470
and so there's no additional
overhead associated with a data

00:17:33.470 --> 00:17:34.750
access.

00:17:34.750 --> 00:17:35.890
So there you go.

00:17:35.890 --> 00:17:39.800
You have the formulation
of the problem.

00:17:39.800 --> 00:17:43.750
You have the trade-off written,
where the cost of migration

00:17:43.750 --> 00:17:46.500
has just the distance.

00:17:46.500 --> 00:17:48.120
But it has a constant factor.

00:17:48.120 --> 00:17:52.780
And you've got twice the
distance, here, for the access.

00:17:52.780 --> 00:17:56.320
Now if s equals d, and I
want to write this down,

00:17:56.320 --> 00:17:58.140
you have a local access.

00:17:58.140 --> 00:18:01.000
And the cost is
assumed to be zero.

00:18:01.000 --> 00:18:02.560
You could change that.

00:18:02.560 --> 00:18:05.760
We are in the realm of
the theory and symbols.

00:18:05.760 --> 00:18:08.280
So you can do whatever you want.

00:18:08.280 --> 00:18:12.770
But given those
equations, our problem

00:18:12.770 --> 00:18:26.780
is decide when to migrate to
minimize total memory access

00:18:26.780 --> 00:18:27.280
cost.

00:18:33.940 --> 00:18:35.630
So in our example
there, I suppose

00:18:35.630 --> 00:18:41.850
we had p1, p2, p2, et cetera.

00:18:41.850 --> 00:18:43.100
And let's say you start at p1.

00:18:46.850 --> 00:18:49.535
This first one would
be a local access.

00:18:49.535 --> 00:18:50.910
And then, you may
decide that you

00:18:50.910 --> 00:18:52.780
want to migrate
to p2, over here.

00:18:56.120 --> 00:18:58.900
In this case, you get this
as a local access, as well.

00:18:58.900 --> 00:19:00.520
So is this one.

00:19:00.520 --> 00:19:03.933
Right here, you might want to
migrate to p1 back to be p1.

00:19:06.910 --> 00:19:08.450
So this becomes a local access.

00:19:08.450 --> 00:19:10.020
That's a local access.

00:19:10.020 --> 00:19:11.830
They're all, essentially, free.

00:19:11.830 --> 00:19:13.850
And then, if you
just stay at p1,

00:19:13.850 --> 00:19:19.140
over here, you may end up doing
remote accesses to p3 and p2,

00:19:19.140 --> 00:19:21.500
respectively.

00:19:21.500 --> 00:19:24.670
And so you have a cost
of migration-- the cost

00:19:24.670 --> 00:19:27.130
of migration and the cost
of two remote access.

00:19:29.810 --> 00:19:31.271
So that's the set up.

00:19:31.271 --> 00:19:32.895
How are we going to
solve this problem?

00:19:37.694 --> 00:19:38.610
Are we going Dijkstra?

00:19:38.610 --> 00:19:39.825
Are we going to
use Bellman-Ford?

00:19:39.825 --> 00:19:41.700
Are we going to use
balanced search trees?

00:19:41.700 --> 00:19:44.159
Are we going to
use hash functions?

00:19:44.159 --> 00:19:45.200
What are we going to use?

00:19:45.200 --> 00:19:46.064
AUDIENCE: Dynamic Programming.

00:19:46.064 --> 00:19:47.939
PROFESSOR SRINI DEVADAS:
Dynamic Programming.

00:19:47.939 --> 00:19:48.714
All together.

00:19:48.714 --> 00:19:50.410
EVERYONE: Dynamic Programming.

00:19:50.410 --> 00:19:52.160
PROFESSOR SRINI DEVADAS:
Dynamic programming, all right.

00:19:52.160 --> 00:19:53.520
We're going to use
dynamic programming

00:19:53.520 --> 00:19:54.436
to solve this problem.

00:19:57.181 --> 00:19:57.680
Good.

00:19:57.680 --> 00:20:00.887
So Erik taught you something.

00:20:00.887 --> 00:20:02.220
AUDIENCE: Where are the erasers?

00:20:02.220 --> 00:20:02.460
PROFESSOR SRINI DEVADAS: Yeah.

00:20:02.460 --> 00:20:03.000
Where are the erasers?

00:20:03.000 --> 00:20:04.460
I think they
fluttered down here.

00:20:04.460 --> 00:20:05.880
All right.

00:20:05.880 --> 00:20:08.545
Let me bail out and use this
while you find the erasers.

00:20:11.060 --> 00:20:16.150
So a program at p1, which
is the processor, initially.

00:20:16.150 --> 00:20:18.480
I'm just going to
set up this DP.

00:20:18.480 --> 00:20:29.809
Let's assume that the number
of processors equals Q. Now,

00:20:29.809 --> 00:20:30.850
what are the subproblems?

00:20:35.100 --> 00:20:37.920
You could do this
many different ways.

00:20:37.920 --> 00:20:41.100
Let's go ahead and use prefixes.

00:20:41.100 --> 00:20:53.710
And so DP(k,p1) is the cost
of the optimal solution

00:20:53.710 --> 00:21:07.840
for the prefix m1 through
mk of memory accesses,

00:21:07.840 --> 00:21:18.111
when the program starts
at p1 and ends at pi.

00:21:18.111 --> 00:21:19.870
So that's my subproblem.

00:21:19.870 --> 00:21:22.970
I want to know, as
I build this up,

00:21:22.970 --> 00:21:24.630
what is the optimal
way that I'm going

00:21:24.630 --> 00:21:26.670
to choose between
migrations and accesses

00:21:26.670 --> 00:21:34.690
for the first k memory access,
assuming a starting point at p1

00:21:34.690 --> 00:21:37.422
and ending at some pi.

00:21:37.422 --> 00:21:39.130
And I need to build
up these subproblems.

00:21:39.130 --> 00:21:40.200
And I want to grow them.

00:21:43.470 --> 00:21:48.460
Let's go ahead and set this up.

00:21:48.460 --> 00:21:50.822
What I want to do now is
figure out DP(k plus 1, pj).

00:21:56.050 --> 00:22:01.400
And assuming I have all
of the k, pi's computed--

00:22:01.400 --> 00:22:04.670
and how many
subproblems do I have?

00:22:04.670 --> 00:22:07.820
How many subproblems do I have?

00:22:07.820 --> 00:22:10.430
Total?

00:22:10.430 --> 00:22:14.880
Look at this and tell me what
the ranges of the possibilities

00:22:14.880 --> 00:22:15.380
are.

00:22:15.380 --> 00:22:17.800
So how many subproblems
would I have?

00:22:17.800 --> 00:22:18.300
Someone?

00:22:22.880 --> 00:22:27.950
N times Q. So you have
N times Q subproblems.

00:22:31.910 --> 00:22:37.260
So you've set this up for
up until k and for all

00:22:37.260 --> 00:22:38.740
of the pi's.

00:22:38.740 --> 00:22:43.230
Now, what you have to do
is essentially say, well,

00:22:43.230 --> 00:22:56.030
DP of k plus 1, pj is going to
be k, pj plus cost of access

00:22:56.030 --> 00:23:07.656
pj, p of mk plus 1 if pj is
not equal to p of mk plus 1.

00:23:07.656 --> 00:23:09.030
So there's going
to be two cases.

00:23:09.030 --> 00:23:13.250
I'll just write this
out and I'll explain it.

00:23:13.250 --> 00:23:16.620
But the first case corresponds
to if the new memory

00:23:16.620 --> 00:23:21.590
access is not in the processor
cache corresponding to pj,

00:23:21.590 --> 00:23:26.640
then what you could do
is use the optimum value,

00:23:26.640 --> 00:23:30.910
where you ended pj, and
simply do a remote access that

00:23:30.910 --> 00:23:35.020
corresponds to
accessing mk plus 1.

00:23:35.020 --> 00:23:36.560
So that's one case.

00:23:36.560 --> 00:23:48.410
The case is to use the minimum
solution-- optimum solution

00:23:48.410 --> 00:23:56.110
corresponding to ending
at pi and do a migration.

00:23:56.110 --> 00:24:01.460
You have cost of
migration from pi to pj.

00:24:01.460 --> 00:24:11.330
And you do this if you
want to go do p of mk

00:24:11.330 --> 00:24:14.810
plus 1-- the processor
corresponding to p

00:24:14.810 --> 00:24:16.900
of mk plus 1.

00:24:16.900 --> 00:24:21.120
So that's the set up for
this dynamic program.

00:24:21.120 --> 00:24:24.590
What you've done is created
a sub problem, its optimum,

00:24:24.590 --> 00:24:27.280
and then you look
at the two cases.

00:24:27.280 --> 00:24:30.990
You want to go migrate and
do a local access-- that's

00:24:30.990 --> 00:24:32.870
this case over here.

00:24:32.870 --> 00:24:35.780
Migrate to the processor
and do a local access there.

00:24:35.780 --> 00:24:36.910
That will be this case.

00:24:36.910 --> 00:24:40.185
And in this case, you stay where
you are and do a remote access.

00:24:43.480 --> 00:24:52.360
In the case of migration, you
could end up choosing different

00:24:52.360 --> 00:24:55.660
initial starting points
corresponding to the pi's.

00:24:55.660 --> 00:24:58.880
And you have to run
through all of those.

00:24:58.880 --> 00:25:05.170
So what's the cost of a
subproblem, or the running time

00:25:05.170 --> 00:25:10.630
of computing one of these
things-- it's order?

00:25:10.630 --> 00:25:19.050
Q. And so the total
cost is NQ squared.

00:25:22.810 --> 00:25:27.270
It's a little review of DP.

00:25:27.270 --> 00:25:31.080
I'm going to stop here
and let Erik take over.

00:25:31.080 --> 00:25:36.530
Just, in closing, while
this makes some assumptions,

00:25:36.530 --> 00:25:39.280
It's actually
fairly close to what

00:25:39.280 --> 00:25:40.790
we're building in hardware.

00:25:40.790 --> 00:25:42.570
This type of
analysis is something

00:25:42.570 --> 00:25:44.190
that we have to do in hardware.

00:25:44.190 --> 00:25:48.200
My research group is building
a 128 processor machine,

00:25:48.200 --> 00:25:50.470
that we call the Execution
Migration Machine.

00:25:50.470 --> 00:25:52.960
And it does exactly what
I've described to you,

00:25:52.960 --> 00:25:54.690
decide whether to
do a remote access

00:25:54.690 --> 00:25:59.650
or to do a migration based
on this kind of analysis.

00:25:59.650 --> 00:26:02.557
So hand it over to Erik.

00:26:02.557 --> 00:26:04.390
PROFESSOR ERIK DEMAINE:
I have a microphone.

00:26:04.390 --> 00:26:04.570
PROFESSOR SRINI
DEVADAS: All right.

00:26:04.570 --> 00:26:05.070
Good.

00:26:08.804 --> 00:26:10.720
PROFESSOR ERIK DEMAINE:
So I have a few things

00:26:10.720 --> 00:26:12.590
to tell you a little bit about.

00:26:12.590 --> 00:26:14.510
Srini talked about
one topic in detail.

00:26:14.510 --> 00:26:17.040
I'm going to talk about
many topics in less detail,

00:26:17.040 --> 00:26:19.660
as I said "shallowly."

00:26:19.660 --> 00:26:22.370
And these are my main
areas of research.

00:26:22.370 --> 00:26:26.770
I do geometry, in particular,
folding, and data structures,

00:26:26.770 --> 00:26:29.470
graphs, and
recreational algorithms.

00:26:29.470 --> 00:26:32.910
That's the really fun stuff.

00:26:32.910 --> 00:26:35.040
A lot of these have
corresponding courses

00:26:35.040 --> 00:26:36.980
if you're interested in
more about this stuff.

00:26:36.980 --> 00:26:39.220
Computational
geometry, in general,

00:26:39.220 --> 00:26:43.180
is-- I'm not going to
remember all numbers.

00:26:43.180 --> 00:26:44.390
840?

00:26:44.390 --> 00:26:46.890
50?

00:26:46.890 --> 00:26:49.660
50.

00:26:49.660 --> 00:26:51.160
6.850.

00:26:51.160 --> 00:26:53.450
That's a class I don't teach.

00:26:53.450 --> 00:26:57.560
Folding is 6.849.

00:26:57.560 --> 00:27:01.610
Data Structures is 6.851.

00:27:01.610 --> 00:27:06.070
And Graphs was being
taught this semester,

00:27:06.070 --> 00:27:07.350
in parallel with this class.

00:27:07.350 --> 00:27:08.770
6.889.

00:27:08.770 --> 00:27:12.260
And recreational algorithms
isn't fully covered

00:27:12.260 --> 00:27:15.321
but you could check
out SP.268, which

00:27:15.321 --> 00:27:17.010
was offered last semester.

00:27:17.010 --> 00:27:19.030
And especially for
those watching at home

00:27:19.030 --> 00:27:22.155
on MIT OpenCourseWare--
this class,

00:27:22.155 --> 00:27:25.230
all the video lectures
are online for free.

00:27:25.230 --> 00:27:26.940
6.851, we'll do
that next semester.

00:27:26.940 --> 00:27:30.050
And 6.889 are all
online, right now.

00:27:30.050 --> 00:27:34.190
And there's some lecture notes
for SP.268 on OpenCourseWare.

00:27:34.190 --> 00:27:35.800
There's a lot of material, here.

00:27:35.800 --> 00:27:38.930
And in particular, the obvious
next class for you to be taking

00:27:38.930 --> 00:27:40.820
is 6.046.

00:27:40.820 --> 00:27:42.654
But why should you
be taking 6.046?

00:27:42.654 --> 00:27:44.820
Because then you can take
all these exciting classes

00:27:44.820 --> 00:27:46.770
and many others
about algorithms.

00:27:46.770 --> 00:27:48.670
There's a complete list
of follow-on classes

00:27:48.670 --> 00:27:51.930
in the lecture notes,
which are online.

00:27:51.930 --> 00:27:55.020
And there's a ton of-- there's
so much research in algorithms.

00:27:55.020 --> 00:27:56.510
It's a really exciting area.

00:27:56.510 --> 00:27:58.990
This is just the
beginning-- just a taste.

00:27:58.990 --> 00:28:02.750
And I want to show you various
exciting places it can go.

00:28:08.670 --> 00:28:10.750
Let's do some algorithms.

00:28:30.680 --> 00:28:33.790
So the first topic I'll
tell you a little bit

00:28:33.790 --> 00:28:38.140
about-- maybe the most fun-- is
geometric folding algorithms.

00:28:41.920 --> 00:28:45.480
That's the title of the
textbook and the class 6.849.

00:28:45.480 --> 00:28:49.107
And in general--
well, there's a lot

00:28:49.107 --> 00:28:50.940
of different kinds of
folding, in the world,

00:28:50.940 --> 00:28:54.290
but maybe the most accessible
and fun is origami.

00:28:54.290 --> 00:28:57.870
So you have, on the one
hand, a piece of paper.

00:28:57.870 --> 00:29:01.010
And you'd like to turn it into
some crazy, three dimensional

00:29:01.010 --> 00:29:03.950
shape, which I'm not
going to try to draw here.

00:29:03.950 --> 00:29:05.900
You want to fold
a giraffe or you

00:29:05.900 --> 00:29:08.230
want to make some
geometric sculpture.

00:29:08.230 --> 00:29:09.380
How do you do this?

00:29:09.380 --> 00:29:13.170
So, usually, you put some
creases into the piece of paper

00:29:13.170 --> 00:29:14.680
in some reasonable way.

00:29:17.330 --> 00:29:19.000
And one of the
questions is what are

00:29:19.000 --> 00:29:21.790
the rules for putting creases
into a piece of paper?

00:29:21.790 --> 00:29:23.090
When is that possible?

00:29:23.090 --> 00:29:26.690
And then you'd like to
fold it into that shape.

00:29:26.690 --> 00:29:28.580
So there are really
two big problems here.

00:29:28.580 --> 00:29:32.530
One is I guess you could
call it foldability.

00:29:35.550 --> 00:29:38.100
And this is what you do
if you practice origami

00:29:38.100 --> 00:29:39.700
in the typical way.

00:29:39.700 --> 00:29:42.180
You get origami diagrams,
and they say, "fold this."

00:29:42.180 --> 00:29:43.530
And you're like, oh, gosh.

00:29:43.530 --> 00:29:45.696
Takes you hours to figure
out how to fold something.

00:29:45.696 --> 00:29:48.302
Especially, if they just
gave you a crease pattern.

00:29:48.302 --> 00:29:50.760
Can you even tell does it fold
into anything, first of all.

00:29:50.760 --> 00:29:53.180
And then, if so, how do I do it?

00:29:53.180 --> 00:30:05.440
That problem-- folding increase
pattern and understanding

00:30:05.440 --> 00:30:10.130
what crease patterns are valid--
unfortunately, is NP-complete.

00:30:10.130 --> 00:30:13.260
So there's no good way to
really understand that.

00:30:13.260 --> 00:30:16.214
So origami is hard.

00:30:16.214 --> 00:30:18.130
In some sense, the more
interesting direction,

00:30:18.130 --> 00:30:19.780
though, is the
reverse direction,

00:30:19.780 --> 00:30:22.360
which I would call
origami design.

00:30:22.360 --> 00:30:26.300
I have an intended 3D
shape I want to design.

00:30:26.300 --> 00:30:30.410
How can I come up with-- how
can I, as an algorithm, convert

00:30:30.410 --> 00:30:33.320
that 3D shape into a crease
pattern that does fold,

00:30:33.320 --> 00:30:36.260
that's guaranteed to
fold into that 3D shape.

00:30:36.260 --> 00:30:38.790
And that's actually solvable.

00:30:38.790 --> 00:30:39.790
So design is easier.

00:30:42.379 --> 00:30:44.170
And there's all sorts
of different versions

00:30:44.170 --> 00:30:46.300
of the design problem.

00:30:46.300 --> 00:30:48.540
Some of them, you could
solve in polynomial time.

00:30:48.540 --> 00:30:49.570
Some of them, you can't.

00:30:49.570 --> 00:30:51.280
If you really want
optimal design,

00:30:51.280 --> 00:30:53.040
that can be NP-complete again.

00:30:53.040 --> 00:30:59.390
But in particular, there's a way
to fold any 3D shape you want.

00:30:59.390 --> 00:31:01.660
So there's an algorithm--
the coolest one, right now,

00:31:01.660 --> 00:31:03.010
is called Origamizer.

00:31:03.010 --> 00:31:06.420
It's free software
online, by Tomohiro Tachi.

00:31:06.420 --> 00:31:09.940
And you give it a 3D
model of a polyhedron.

00:31:09.940 --> 00:31:13.070
And it outputs a
giant crease pattern

00:31:13.070 --> 00:31:16.090
on a square piece of paper that
folds into that 3D polyhedron.

00:31:16.090 --> 00:31:18.710
And it's reasonably practical.

00:31:18.710 --> 00:31:21.210
And he's folded tons
of models in that way.

00:31:23.810 --> 00:31:27.022
Let's see.

00:31:27.022 --> 00:31:28.980
I'll show you some other things.

00:31:28.980 --> 00:31:32.940
Here's a simple example of
a geometric origami model.

00:31:32.940 --> 00:31:37.420
So this is folded from a square
paper with concentric squares

00:31:37.420 --> 00:31:38.610
as creases.

00:31:38.610 --> 00:31:40.357
Alternating mountain and valley.

00:31:40.357 --> 00:31:42.190
So you see mountain
valley, mountain valley.

00:31:42.190 --> 00:31:43.470
Also fold the diagonals.

00:31:43.470 --> 00:31:45.090
It's very easy to make.

00:31:45.090 --> 00:31:46.860
And what's funny--
what's cool about it

00:31:46.860 --> 00:31:48.850
is that when you put
all those creases in,

00:31:48.850 --> 00:31:52.410
it pops into this 3D shape,
which for many years people

00:31:52.410 --> 00:31:54.220
conjectured was a
hyperbolic parabola.

00:31:54.220 --> 00:31:56.420
This design is one of the
earliest geometric origami

00:31:56.420 --> 00:31:56.920
designs.

00:31:56.920 --> 00:32:01.920
It goes back to late '20s in
the Bauhaus School of Design.

00:32:01.920 --> 00:32:03.240
And it's very cool.

00:32:03.240 --> 00:32:04.890
People fold them a lot.

00:32:04.890 --> 00:32:09.600
I've personally folded
thousands of them

00:32:09.600 --> 00:32:10.740
for sculpture and things.

00:32:10.740 --> 00:32:13.400
We also do a lot of
algorithmic sculpture, which

00:32:13.400 --> 00:32:15.590
I won't talk about
in detail here.

00:32:15.590 --> 00:32:20.770
But we discovered, two years
ago, that this does not exist.

00:32:20.770 --> 00:32:23.320
It is impossible to fold
a square piece of paper

00:32:23.320 --> 00:32:25.520
with this crease pattern.

00:32:25.520 --> 00:32:27.160
That was a bit of a surprise.

00:32:27.160 --> 00:32:29.650
And it's kind of fun to make
things that don't exist.

00:32:29.650 --> 00:32:30.733
AUDIENCE: So what is that?

00:32:30.733 --> 00:32:33.490
PROFESSOR ERIK DEMAINE:
So what is this?

00:32:33.490 --> 00:32:39.090
Well, somehow, physical world is
differing from the real world.

00:32:39.090 --> 00:32:42.550
Now, some ways it
might be differing

00:32:42.550 --> 00:32:45.750
are that these
creases might not be

00:32:45.750 --> 00:32:47.140
creases in the technical sense.

00:32:47.140 --> 00:32:49.390
A crease is a place that
should be non-differentiable.

00:32:49.390 --> 00:32:51.120
So maybe they're kind
of rounding it out.

00:32:51.120 --> 00:32:52.910
And then, who knows
what's happening.

00:32:52.910 --> 00:32:54.300
Then, kind of all bets are off.

00:32:54.300 --> 00:32:56.300
Another possibility of
what I think is happening

00:32:56.300 --> 00:32:58.859
is that their are extra creases,
in here, that you don't see.

00:32:58.859 --> 00:32:59.650
They're very small.

00:32:59.650 --> 00:33:04.420
If you look, especially the raw
edge, here, and that profile.

00:33:04.420 --> 00:33:05.580
It's a little bit wavy.

00:33:05.580 --> 00:33:07.700
And it's conceivable
there's some points here

00:33:07.700 --> 00:33:09.680
that look
non-differentiable to me.

00:33:09.680 --> 00:33:12.120
And I always thought I wasn't
folding it well enough.

00:33:12.120 --> 00:33:15.180
But in fact, something
like that has to happen.

00:33:15.180 --> 00:33:16.970
And my conjecture
is, if you look

00:33:16.970 --> 00:33:19.340
at this under a microscope,
which we haven't done yet,

00:33:19.340 --> 00:33:21.580
there are little creases
that are so shallow they're

00:33:21.580 --> 00:33:23.900
hard to see, but are there.

00:33:23.900 --> 00:33:26.662
And the theorem says some
creases have to be there.

00:33:26.662 --> 00:33:28.620
It is possible to fold
this with extra creases,

00:33:28.620 --> 00:33:31.740
but not with these.

00:33:31.740 --> 00:33:34.599
So get rid of that.

00:33:34.599 --> 00:33:36.390
On the other hand, if
you do the same thing

00:33:36.390 --> 00:33:38.770
with concentric circular
creases-- this a little harder

00:33:38.770 --> 00:33:39.400
to unfold.

00:33:39.400 --> 00:33:43.150
It really wants to be in
this kind of Pringles shape.

00:33:43.150 --> 00:33:45.690
This also is from about Bauhaus.

00:33:45.690 --> 00:33:47.690
It's a little harder to
fold concentric circles.

00:33:47.690 --> 00:33:50.700
But this, we think, does exist.

00:33:50.700 --> 00:33:52.820
Can't prove it yet.

00:33:52.820 --> 00:33:56.930
So we've done a lot of
sculpture based on these guys.

00:33:56.930 --> 00:34:00.080
What else do I want to say?

00:34:00.080 --> 00:34:01.310
Another demo.

00:34:01.310 --> 00:34:02.480
So here's a fun problem.

00:34:02.480 --> 00:34:03.460
This is a magic trick.

00:34:03.460 --> 00:34:06.980
Goes back to Houdini and others.

00:34:06.980 --> 00:34:13.010
So imagine I take a rectangle
of paper and then I fold it flat

00:34:13.010 --> 00:34:16.150
and take my scissors--
not strict origami, here--

00:34:16.150 --> 00:34:18.120
and I make one
complete straight cut.

00:34:21.940 --> 00:34:24.310
In this case, I get two pieces.

00:34:24.310 --> 00:34:25.560
And I unfold the pieces.

00:34:25.560 --> 00:34:28.630
And the question is what shapes
can I get out of those pieces?

00:34:28.630 --> 00:34:31.409
In this case, I get a swan.

00:34:31.409 --> 00:34:35.199
You're not impressed
so I'll another one.

00:34:35.199 --> 00:34:36.881
Make one straight cut.

00:34:36.881 --> 00:34:38.380
These are on my web
page if you want

00:34:38.380 --> 00:34:39.546
to impress all your friends.

00:34:42.281 --> 00:34:44.739
You could take the class if
you want to know how it's done.

00:34:47.642 --> 00:34:49.100
This example has
a lot of symmetry.

00:34:49.100 --> 00:34:52.008
You get a little angelfish.

00:34:52.008 --> 00:34:53.420
I only have one more example.

00:34:53.420 --> 00:34:55.030
I hope you'll be impressed.

00:34:55.030 --> 00:34:58.600
This is very hard to fold.

00:34:58.600 --> 00:35:02.390
It was an MIT spotlight
picture, at some point.

00:35:02.390 --> 00:35:03.570
And it's even harder to cut.

00:35:06.350 --> 00:35:07.170
Straight cut.

00:35:14.290 --> 00:35:19.876
This should be the MIT logo.

00:35:19.876 --> 00:35:25.200
[APPLAUSE]

00:35:25.200 --> 00:35:27.470
So the theorem is there's
an algorithm, given

00:35:27.470 --> 00:35:29.417
any set of polygons
in the plane,

00:35:29.417 --> 00:35:31.000
you could fold, make
one straight cut,

00:35:31.000 --> 00:35:32.530
and get exactly those polygons.

00:35:32.530 --> 00:35:33.600
There's some
limits, in practice,

00:35:33.600 --> 00:35:34.724
because of paper thickness.

00:35:34.724 --> 00:35:38.240
But in theory, you
can do everything.

00:35:38.240 --> 00:35:39.250
All right.

00:35:39.250 --> 00:35:39.820
Fun stuff.

00:35:44.580 --> 00:35:47.020
I don't think I have time
to talk about self-assembly.

00:35:47.020 --> 00:35:49.311
Let me talk a little bit
about data structures because,

00:35:49.311 --> 00:35:52.650
conveniently, Srini drew
this diagram for me.

00:35:52.650 --> 00:35:56.140
And I have the exact same
diagram-- the left one, though.

00:35:56.140 --> 00:35:56.890
I'm old fashioned.

00:35:59.560 --> 00:36:03.790
So the models of computation
we've used, in this class,

00:36:03.790 --> 00:36:04.770
are pretty simple.

00:36:04.770 --> 00:36:06.600
We have, in particular,
the Word RAM.

00:36:06.600 --> 00:36:07.550
You can read a word.

00:36:07.550 --> 00:36:08.887
You can add two words.

00:36:08.887 --> 00:36:10.970
Do whatever you want with
a constant number words.

00:36:10.970 --> 00:36:12.810
Send them out to main memory.

00:36:12.810 --> 00:36:15.170
Everything's the
same amount of time.

00:36:15.170 --> 00:36:18.080
It's all constant,
anyway, so who cares?

00:36:18.080 --> 00:36:21.140
Except there's this
issue in real computers,

00:36:21.140 --> 00:36:23.140
and it gets even worse
with parallel, but let's

00:36:23.140 --> 00:36:29.790
stick to sequential old
fashioned computers.

00:36:29.790 --> 00:36:33.950
So you have this slow bottleneck
between main memory and cache.

00:36:33.950 --> 00:36:35.030
Cache is really fast.

00:36:35.030 --> 00:36:37.340
Think of this as
a really fat pipe.

00:36:37.340 --> 00:36:40.047
And this is a very thin pipe.

00:36:40.047 --> 00:36:40.630
What do we do?

00:36:40.630 --> 00:36:42.610
We'd like to always work
with things in cache,

00:36:42.610 --> 00:36:44.850
but that's kind of difficult.

00:36:44.850 --> 00:36:46.390
At some point, you
run out of space.

00:36:46.390 --> 00:36:47.723
You've got to go to main memory.

00:36:47.723 --> 00:36:51.200
And maybe to disc, other
levels of the memory hierarchy.

00:36:51.200 --> 00:36:54.750
So what systems do is, when you
fetch something from memory,

00:36:54.750 --> 00:36:59.260
you don't just get one word,
you get an entire cache line.

00:36:59.260 --> 00:37:01.570
And cache lines are
getting bigger and bigger.

00:37:01.570 --> 00:37:09.140
But memory transfers
happen in blocks,

00:37:09.140 --> 00:37:10.840
when you're going
to a big memory.

00:37:19.600 --> 00:37:22.460
So let's say B is
the size of a block.

00:37:22.460 --> 00:37:24.600
There is another
model of computation

00:37:24.600 --> 00:37:26.770
that's more sophisticated
than the Word RAM that

00:37:26.770 --> 00:37:31.490
says how should my running time
depend on B. How many memory

00:37:31.490 --> 00:37:35.466
transfers do I need to do,
as a function of B and n?

00:37:35.466 --> 00:37:40.047
And so for example, if you
want to do search-- normally,

00:37:40.047 --> 00:37:41.380
we think of doing binary search.

00:37:41.380 --> 00:37:44.720
That takes log(n) accesses
if everything is uniform.

00:37:44.720 --> 00:37:46.340
But with asymmetry,
and if you're

00:37:46.340 --> 00:37:49.340
reading in entire blocks,
if you do it right,

00:37:49.340 --> 00:37:56.190
you can do it in log base B
of n, instead of log base 2.

00:37:56.190 --> 00:37:58.490
This is counting memory
transfers, not computation.

00:37:58.490 --> 00:38:01.160
Computation here is free.

00:38:01.160 --> 00:38:03.720
It's a little weird,
but you get used to it.

00:38:03.720 --> 00:38:05.770
Sorting.

00:38:05.770 --> 00:38:06.800
They're classic.

00:38:06.800 --> 00:38:10.230
Just to give you an idea of how
this gets a little complicated.

00:38:10.230 --> 00:38:15.710
You get n divided by B times
log base C of n divided by B. C

00:38:15.710 --> 00:38:20.610
is the number of blocks
that fit in here.

00:38:20.610 --> 00:38:24.970
So there's C different blocks
that fit in your cache.

00:38:24.970 --> 00:38:26.780
That's the optimal way to sort.

00:38:26.780 --> 00:38:29.687
Just upper and lower bounds
in the comparison model.

00:38:29.687 --> 00:38:30.770
Just to give you a flavor.

00:38:30.770 --> 00:38:33.390
And there's a whole study
of algorithms to do this.

00:38:33.390 --> 00:38:35.850
What's really cool is you
can achieve these bounds

00:38:35.850 --> 00:38:37.930
even if you don't
know what B is.

00:38:37.930 --> 00:38:39.360
And if you don't know what C is.

00:38:39.360 --> 00:38:42.140
There's one algorithm, that
whatever the architecture is

00:38:42.140 --> 00:38:44.607
underlying it, we'll still
achieve the same bounds.

00:38:44.607 --> 00:38:46.440
Those are called
cache-oblivious algorithms,

00:38:46.440 --> 00:38:48.580
and they were
invented, here, at MIT.

00:38:53.065 --> 00:38:58.100
I think I want to-- this
is too much fun to pass up.

00:38:58.100 --> 00:39:02.740
On the Word RAM,
there's this problem,

00:39:02.740 --> 00:39:04.570
which we've dealt
with several times.

00:39:04.570 --> 00:39:08.920
What if you want to maintain
a dynamic set of elements--

00:39:08.920 --> 00:39:09.820
integers.

00:39:09.820 --> 00:39:13.870
I want to do insert, delete,
predecessor, successor.

00:39:13.870 --> 00:39:16.810
This is what binary
search trees do.

00:39:16.810 --> 00:39:19.090
But you can do better.

00:39:19.090 --> 00:39:26.420
If we have integers-- n
integers-- in the range

00:39:26.420 --> 00:39:28.760
0 to u minus 1.

00:39:28.760 --> 00:39:31.830
So u is the size
of the universe.

00:39:31.830 --> 00:39:37.840
Then, we already know
how to do log(n).

00:39:37.840 --> 00:39:40.530
But you can do two bounds.

00:39:40.530 --> 00:39:41.435
One is log(log(u)).

00:39:44.807 --> 00:39:46.640
This is a data structure
called [INAUDIBLE].

00:39:50.700 --> 00:39:52.780
And it's in CLRS, if
you're interested.

00:39:52.780 --> 00:40:01.025
You can also do log(log(n))
divided by log(log(u)).

00:40:01.025 --> 00:40:02.900
This is a data structure
called fusion trees.

00:40:02.900 --> 00:40:04.233
It's an advanced data structure.

00:40:04.233 --> 00:40:08.030
6.851, if you're interested.

00:40:08.030 --> 00:40:09.850
And you can take the
min of those two.

00:40:09.850 --> 00:40:12.050
That's, essentially,
the best possible,

00:40:12.050 --> 00:40:13.800
the matching lower
bound, that that that's

00:40:13.800 --> 00:40:14.633
all you can achieve.

00:40:17.190 --> 00:40:20.528
And so just to state it
in terms that you know,

00:40:20.528 --> 00:40:23.390
which is normal n bounds.

00:40:23.390 --> 00:40:25.560
You take the min of
those two things,

00:40:25.560 --> 00:40:31.955
there are always at most
square root log(n) divided

00:40:31.955 --> 00:40:32.580
by log(log(n)).

00:40:37.680 --> 00:40:39.520
Compare that with log(n).

00:40:39.520 --> 00:40:41.546
It's way better.

00:40:41.546 --> 00:40:42.670
A whole square root better.

00:40:42.670 --> 00:40:44.044
And a little tiny
savings better.

00:40:44.044 --> 00:40:45.220
And this is optimal.

00:40:45.220 --> 00:40:46.200
It is a function of n.

00:40:46.200 --> 00:40:48.870
That's the best you can do
for the predecessor problem.

00:40:48.870 --> 00:40:50.000
So pretty crazy stuff.

00:40:50.000 --> 00:40:51.530
It's a very
complicated structure.

00:40:51.530 --> 00:40:53.960
It's probably
completely impractical.

00:40:53.960 --> 00:40:55.240
But, hey.

00:40:55.240 --> 00:40:57.770
They're, theoretically,
pretty cool.

00:40:57.770 --> 00:41:00.110
I'll tell you a little bit
about graph algorithms.

00:41:21.070 --> 00:41:23.580
We've seen a lot of graph
algorithms in this class.

00:41:23.580 --> 00:41:27.980
One way to make them new and fun
again is to suppose your graph

00:41:27.980 --> 00:41:29.252
is planar or almost planer.

00:41:29.252 --> 00:41:30.960
Meaning you can draw
it in two dimensions

00:41:30.960 --> 00:41:34.280
without any crossings, as you
might get from a graph that's

00:41:34.280 --> 00:41:38.330
drawn on the earth,
like a road network

00:41:38.330 --> 00:41:41.100
or something with no
or few overpasses.

00:41:41.100 --> 00:41:42.600
Then you can do
things a lot better.

00:41:42.600 --> 00:41:44.820
For example, you can
do the equivalent

00:41:44.820 --> 00:41:46.360
of Dijkstra's algorithm.

00:41:46.360 --> 00:41:49.860
So non-negative weight
shortest path, in linear time.

00:41:52.700 --> 00:41:56.627
That's not so impressive cause
Dijkstra is number of edges.

00:41:56.627 --> 00:41:57.960
Here, I mean number of vertices.

00:41:57.960 --> 00:42:00.530
It doesn't really matter
with planar graphs.

00:42:00.530 --> 00:42:02.540
And we had E log(V).

00:42:02.540 --> 00:42:04.780
You can write E,
here, if you prefer.

00:42:04.780 --> 00:42:06.060
It's only a log savings.

00:42:06.060 --> 00:42:08.655
More impressive, is you can
do with negative weights--

00:42:08.655 --> 00:42:14.840
the equivalent of Bellman-Ford--
in almost linear time.

00:42:18.700 --> 00:42:20.740
So some log factors.

00:42:20.740 --> 00:42:22.480
Log squared n divided
by log(log(n)).

00:42:22.480 --> 00:42:23.980
It's the best bound
known to date.

00:42:23.980 --> 00:42:25.360
That was a result
from last year.

00:42:25.360 --> 00:42:27.560
So it's still a
work in progress.

00:42:27.560 --> 00:42:29.610
And if you're interested
in this kind of stuff,

00:42:29.610 --> 00:42:33.330
you should check out the videos
for the class we just taught,

00:42:33.330 --> 00:42:35.610
6.889.

00:42:35.610 --> 00:42:36.720
And recreation algorithms.

00:42:36.720 --> 00:42:38.470
I've actually already
told you about a lot

00:42:38.470 --> 00:42:42.280
of these-- like algorithms for
solving a Rubik's cube in n

00:42:42.280 --> 00:42:43.640
squared divided by log(n) steps.

00:42:43.640 --> 00:42:45.410
That was a paper this year.

00:42:45.410 --> 00:42:46.700
Tetris is NP-complete.

00:42:46.700 --> 00:42:49.440
A whole bunch of
NP-completeness, and x time

00:42:49.440 --> 00:42:51.130
completeness, and so on.

00:42:51.130 --> 00:42:52.860
Results for games.

00:42:52.860 --> 00:42:55.800
Other fun stuff, like
balloon twisting-- algorithms

00:42:55.800 --> 00:42:59.060
for designing how to balloon
twist a given polyhedron,

00:42:59.060 --> 00:43:01.480
optimally, using
the fewest balloons.

00:43:01.480 --> 00:43:02.780
Algorithmic magic tricks.

00:43:02.780 --> 00:43:04.190
There's tons of stuff out there.

00:43:04.190 --> 00:43:05.067
It's really fun.

00:43:05.067 --> 00:43:07.150
I should teach a class
about some of those things,

00:43:07.150 --> 00:43:07.900
but I haven't yet.

00:43:10.510 --> 00:43:13.670
The last thing we wanted
to do is together.

00:43:13.670 --> 00:43:16.132
And it has to do with these

00:43:16.132 --> 00:43:18.090
PROFESSOR SRINI DEVADAS:
Getting rid of these--

00:43:18.090 --> 00:43:18.430
PROFESSOR ERIK DEMAINE:
These cushions.

00:43:18.430 --> 00:43:20.400
Getting rid of
these damn cushions.

00:43:20.400 --> 00:43:22.675
We have so many
of these cushions.

00:43:22.675 --> 00:43:24.100
Just gotta get rid of them.

00:43:27.350 --> 00:43:28.550
That's two freebies.

00:43:28.550 --> 00:43:30.050
PROFESSOR SRINI
DEVADAS: Now, you're

00:43:30.050 --> 00:43:31.949
going to have to pay
for these cushions.

00:43:31.949 --> 00:43:33.490
PROFESSOR ERIK
DEMAINE: He's kidding.

00:43:33.490 --> 00:43:33.860
He's kidding.

00:43:33.860 --> 00:43:35.111
Actually we're having trouble.

00:43:35.111 --> 00:43:36.651
We're having trouble
giving them away

00:43:36.651 --> 00:43:38.400
because-- I don't
know-- some people seem

00:43:38.400 --> 00:43:39.980
to not like them very much.

00:43:39.980 --> 00:43:40.800
And neither do we.

00:43:40.800 --> 00:43:46.700
So we wanted to give you some
motivation for why you really

00:43:46.700 --> 00:43:49.250
need some of these cushions.

00:43:49.250 --> 00:43:51.439
So we actually
prepared a top 10 list.

00:43:51.439 --> 00:43:53.230
PROFESSOR SRINI DEVADAS:
This is the top 10

00:43:53.230 --> 00:43:57.710
uses of 6.006 cushions.

00:43:57.710 --> 00:43:59.130
We're going to alternate here.

00:43:59.130 --> 00:44:00.410
Number 10.

00:44:00.410 --> 00:44:02.240
PROFESSOR ERIK DEMAINE:
You can sit on it

00:44:02.240 --> 00:44:06.159
and get guaranteed
inspiration in constant time.

00:44:06.159 --> 00:44:07.700
PROFESSOR SRINI
DEVADAS: Don't forget

00:44:07.700 --> 00:44:09.234
to bring one for the final exam.

00:44:09.234 --> 00:44:11.150
PROFESSOR ERIK DEMAINE:
Highly recommended it.

00:44:11.150 --> 00:44:12.842
Number nine.

00:44:12.842 --> 00:44:15.050
PROFESSOR SRINI DEVADAS:
You can use it as a Frisbee.

00:44:15.050 --> 00:44:18.330
You've seen that before, except
you cut it into a circle.

00:44:18.330 --> 00:44:19.639
You cut it into a circle.

00:44:19.639 --> 00:44:20.680
And it works really well.

00:44:23.237 --> 00:44:25.820
PROFESSOR ERIK DEMAINE: We had
fun with a Bandsaw, last night.

00:44:25.820 --> 00:44:27.495
PROFESSOR SRINI
DEVADAS: Number eight.

00:44:27.495 --> 00:44:29.120
PROFESSOR ERIK DEMAINE:
You can sell it

00:44:29.120 --> 00:44:32.272
as a limited edition
collectible on eBay.

00:44:32.272 --> 00:44:33.730
PROFESSOR SRINI
DEVADAS: It's never

00:44:33.730 --> 00:44:36.790
ever going to be made, again.

00:44:36.790 --> 00:44:39.610
You can make money off
this in 5 years-- 10 years.

00:44:39.610 --> 00:44:40.370
PROFESSOR ERIK
DEMAINE: At least $5.

00:44:40.370 --> 00:44:40.911
I don't know.

00:44:43.590 --> 00:44:44.310
Number seven.

00:44:44.310 --> 00:44:46.010
PROFESSOR SRINI
DEVADAS: Number seven.

00:44:46.010 --> 00:44:49.740
If you had two of these, you
could stick them like this,

00:44:49.740 --> 00:44:54.922
and remove the branding, and
use it as a regular cushion.

00:44:54.922 --> 00:44:56.380
PROFESSOR ERIK
DEMAINE: Now, no one

00:44:56.380 --> 00:44:58.281
will ever know you
took this class.

00:44:58.281 --> 00:44:59.030
You just need two.

00:45:01.550 --> 00:45:03.050
PROFESSOR SRINI
DEVADAS: Number six.

00:45:03.050 --> 00:45:04.508
PROFESSOR ERIK
DEMAINE: Number six.

00:45:04.508 --> 00:45:06.537
It's a holiday
conversation starter.

00:45:06.537 --> 00:45:08.620
PROFESSOR SRINI DEVADAS:
And conversation stopper.

00:45:13.210 --> 00:45:14.710
PROFESSOR ERIK
DEMAINE: Number five.

00:45:14.710 --> 00:45:15.550
PROFESSOR SRINI
DEVADAS: Asymptotically

00:45:15.550 --> 00:45:17.190
optimal-- we had
to use that term,

00:45:17.190 --> 00:45:18.799
acoustic acoustic paneling.

00:45:18.799 --> 00:45:21.340
PROFESSOR ERIK DEMAINE: That
was a suggestion from a student.

00:45:21.340 --> 00:45:23.270
You just need a lot of them.

00:45:23.270 --> 00:45:26.460
This would be great for piano,
guitar fingering practice.

00:45:26.460 --> 00:45:29.189
You know you're doing your DP.

00:45:29.189 --> 00:45:30.730
PROFESSOR SRINI
DEVADAS: Number four.

00:45:30.730 --> 00:45:31.310
PROFESSOR ERIK
DEMAINE: Number four.

00:45:31.310 --> 00:45:33.960
You can use it as target
practice for your next larp

00:45:33.960 --> 00:45:34.460
session.

00:45:38.695 --> 00:45:39.195
Woah.

00:45:39.195 --> 00:45:39.695
Misfire.

00:45:42.300 --> 00:45:43.120
I'm missing.

00:45:43.120 --> 00:45:45.120
PROFESSOR SRINI DEVADAS:
You haven't hit me yet.

00:45:46.791 --> 00:45:47.290
All right.

00:45:47.290 --> 00:45:49.461
Finally, you got one.

00:45:49.461 --> 00:45:51.369
[APPLAUSE]

00:45:55.279 --> 00:45:56.820
PROFESSOR ERIK
DEMAINE: Number three.

00:45:56.820 --> 00:45:57.400
PROFESSOR SRINI
DEVADAS: All right.

00:45:57.400 --> 00:45:59.440
10 years from now,
it might be all

00:45:59.440 --> 00:46:01.592
you remember about double 0 6.

00:46:03.769 --> 00:46:05.310
PROFESSOR ERIK
DEMAINE: In truth, you

00:46:05.310 --> 00:46:07.352
might also remember
this top 10 list.

00:46:07.352 --> 00:46:08.810
PROFESSOR SRINI
DEVADAS: All right.

00:46:08.810 --> 00:46:09.530
Number two.

00:46:09.530 --> 00:46:10.988
PROFESSOR ERIK
DEMAINE: Number two.

00:46:10.988 --> 00:46:13.070
You can use it as your
final exam cheat sheet.

00:46:13.070 --> 00:46:14.824
This is a new rule.

00:46:14.824 --> 00:46:18.080
Instead of 8 and
1/2 by 11, you could

00:46:18.080 --> 00:46:21.430
bring in the appropriate
number of cushions.

00:46:21.430 --> 00:46:26.010
And the number one-- number one
use for a double 0 6 cushion.

00:46:26.010 --> 00:46:27.650
PROFESSOR SRINI
DEVADAS: Three words.

00:46:27.650 --> 00:46:29.580
OK Cupid profile picture.

00:46:35.220 --> 00:46:37.100
Don't use this cheat sheet.

00:46:37.100 --> 00:46:39.449
But come to the final
exam and good luck.

00:46:39.449 --> 00:46:40.740
PROFESSOR ERIK DEMAINE: Thanks.

00:46:40.740 --> 00:46:43.790
[APPLAUSE]