WEBVTT

00:00:00.500 --> 00:00:02.300
There are many
approaches to building

00:00:02.300 --> 00:00:05.500
a general-purpose computer that
can be easily re-programmed

00:00:05.500 --> 00:00:07.110
for new problems.

00:00:07.110 --> 00:00:10.160
Almost all modern computers are
based on the “stored program”

00:00:10.160 --> 00:00:13.320
computer architecture developed
by John von Neumann in 1945,

00:00:13.320 --> 00:00:18.450
which is now commonly referred
to as the “von Neumann model”.

00:00:18.450 --> 00:00:21.000
The von Neumann model
has three components.

00:00:21.000 --> 00:00:24.900
There’s a central processing
unit (aka the CPU) that

00:00:24.900 --> 00:00:29.800
contains a datapath and control
FSM as described previously.

00:00:29.800 --> 00:00:32.140
The CPU is connected
to a read/write memory

00:00:32.140 --> 00:00:36.370
that holds some number W
of words, each with N bits.

00:00:36.370 --> 00:00:39.530
Nowadays, even small
memories have a billion words

00:00:39.530 --> 00:00:42.590
and the width of each
location is at least 32 bits

00:00:42.590 --> 00:00:44.610
(usually more).

00:00:44.610 --> 00:00:47.850
This memory is often referred to
as “main memory” to distinguish

00:00:47.850 --> 00:00:50.710
it from other memories
in the system.

00:00:50.710 --> 00:00:54.400
You can think of it as an array:
when the CPU wishes to operate

00:00:54.400 --> 00:00:57.970
on values in memory , it sends
the memory an array index,

00:00:57.970 --> 00:01:01.950
which we call the address, and,
after a short delay (currently

00:01:01.950 --> 00:01:05.920
10’s of nanoseconds) the memory
will return the N-bit value

00:01:05.920 --> 00:01:08.050
stored at that address.

00:01:08.050 --> 00:01:10.200
Writes to main memory
follow the same protocol

00:01:10.200 --> 00:01:12.340
except, of course,
the data flows

00:01:12.340 --> 00:01:14.270
in the opposite direction.

00:01:14.270 --> 00:01:16.290
We’ll talk about memory
technologies a couple

00:01:16.290 --> 00:01:18.200
of lectures from now.

00:01:18.200 --> 00:01:20.430
And, finally, there are
input/output devices

00:01:20.430 --> 00:01:22.490
that enable the computer
system to communicate

00:01:22.490 --> 00:01:26.080
with the outside world or
to access data storage that,

00:01:26.080 --> 00:01:28.980
unlike main memory, will
remember values even when

00:01:28.980 --> 00:01:29.570
turned off.

00:01:32.220 --> 00:01:33.920
The key idea is
to use main memory

00:01:33.920 --> 00:01:37.460
to hold the instructions
for the CPU as well as data.

00:01:37.460 --> 00:01:40.480
Both instructions and data are,
of course, just binary data

00:01:40.480 --> 00:01:42.940
stored in main memory.

00:01:42.940 --> 00:01:45.830
Interpreted as an
instruction, a value in memory

00:01:45.830 --> 00:01:48.900
can be thought of as a set of
fields containing one or bits

00:01:48.900 --> 00:01:51.220
encoding information
about the actions

00:01:51.220 --> 00:01:54.140
to be performed by the CPU.

00:01:54.140 --> 00:01:57.430
The opcode field indicates
the operation to be performed

00:01:57.430 --> 00:02:00.790
(e.g., ADD, XOR, COMPARE).

00:02:00.790 --> 00:02:04.660
Subsequent fields specify which
registers supply the source

00:02:04.660 --> 00:02:06.960
operands and the
destination register

00:02:06.960 --> 00:02:09.639
where the result is stored.

00:02:09.639 --> 00:02:11.930
The CPU interprets the
information in the instruction

00:02:11.930 --> 00:02:15.670
fields and performs the
requested operation.

00:02:15.670 --> 00:02:18.260
It would then move on to the
next instruction in memory,

00:02:18.260 --> 00:02:21.650
executing the stored
program step-by-step.

00:02:21.650 --> 00:02:23.540
The goal of this
chapter is to discuss

00:02:23.540 --> 00:02:27.160
the details of what operations
we want the CPU to perform,

00:02:27.160 --> 00:02:31.070
how many registers we
should have, and so on.

00:02:31.070 --> 00:02:34.430
Of course, some values in
memory are not instructions!

00:02:34.430 --> 00:02:36.250
They might be binary
data representing

00:02:36.250 --> 00:02:40.770
numeric values, strings
of characters, and so on.

00:02:40.770 --> 00:02:43.700
The CPU will read these values
into its temporary registers

00:02:43.700 --> 00:02:47.140
when it needs to operate on them
and write newly computed values

00:02:47.140 --> 00:02:49.420
back into memory.

00:02:49.420 --> 00:02:52.150
Mr. Blue is asking a
good question: how do we

00:02:52.150 --> 00:02:54.590
know which words in
memory are instructions

00:02:54.590 --> 00:02:56.070
and which are data?

00:02:56.070 --> 00:02:58.630
After all, they’re
both binary values!

00:02:58.630 --> 00:03:02.240
The answer is that we can’t tell
by looking at the values — it’s

00:03:02.240 --> 00:03:05.910
how they are used by the CPU
that distinguishes instructions

00:03:05.910 --> 00:03:06.960
from data.

00:03:06.960 --> 00:03:09.060
If a value is loaded
into the datapath,

00:03:09.060 --> 00:03:11.120
it’s being used as data.

00:03:11.120 --> 00:03:13.410
If a value is loaded
by the control logic,

00:03:13.410 --> 00:03:16.160
it’s being used an instruction.

00:03:16.160 --> 00:03:18.870
So this is the digital
system we’ll build to perform

00:03:18.870 --> 00:03:20.520
computations.

00:03:20.520 --> 00:03:23.030
We’ll start with a datapath
that contains some number

00:03:23.030 --> 00:03:25.230
of registers to
hold data values.

00:03:25.230 --> 00:03:28.150
We’ll be able to select which
registers will supply operands

00:03:28.150 --> 00:03:30.870
for the arithmetic and
logic unit that will perform

00:03:30.870 --> 00:03:32.710
an operation.

00:03:32.710 --> 00:03:35.870
The ALU produces a result
and other status signals.

00:03:35.870 --> 00:03:37.750
The ALU result can
be written back

00:03:37.750 --> 00:03:40.170
to one of the registers
for later use.

00:03:40.170 --> 00:03:43.500
We’ll provide the datapath with
means to move data to and from

00:03:43.500 --> 00:03:44.870
main memory.

00:03:44.870 --> 00:03:46.360
There will be a
control unit that

00:03:46.360 --> 00:03:49.750
provides the necessary control
signals to the datapath.

00:03:49.750 --> 00:03:52.040
In the example
datapath shown here,

00:03:52.040 --> 00:03:54.710
the control unit would
provide ASEL and BSEL

00:03:54.710 --> 00:03:57.780
to select two register
values as operands

00:03:57.780 --> 00:03:59.880
and DEST to select
the register where

00:03:59.880 --> 00:04:02.900
the ALU result will be written.

00:04:02.900 --> 00:04:06.320
If the datapath had, say,
32 internal registers,

00:04:06.320 --> 00:04:10.040
ASEL, BSEL and DEST
would be 5-bit values,

00:04:10.040 --> 00:04:13.210
each specifying a particular
register number in the range

00:04:13.210 --> 00:04:13.790
0 to 31.

00:04:13.790 --> 00:04:18.350
The control unit also
provides the FN function code

00:04:18.350 --> 00:04:22.000
that controls the operation
performed by the ALU.

00:04:22.000 --> 00:04:24.250
The ALU we designed in
Part 1 of the course

00:04:24.250 --> 00:04:26.480
requires a 6-bit
function code to select

00:04:26.480 --> 00:04:30.970
between a variety of arithmetic,
boolean and shift operations.

00:04:30.970 --> 00:04:33.640
The control unit would load
values from main memory

00:04:33.640 --> 00:04:36.380
to be interpreted
as instructions.

00:04:36.380 --> 00:04:38.130
The control unit
contains a register,

00:04:38.130 --> 00:04:41.770
called the “program counter”,
that keeps track of the address

00:04:41.770 --> 00:04:45.420
in main memory of the next
instruction to be executed.

00:04:45.420 --> 00:04:48.450
The control unit also
contains a (hopefully small)

00:04:48.450 --> 00:04:50.930
amount of logic to translate
the instruction fields

00:04:50.930 --> 00:04:53.880
into the necessary
control signals.

00:04:53.880 --> 00:04:56.330
Note the control unit
receives status signals

00:04:56.330 --> 00:04:58.980
from the datapath that will
enable programs to execute

00:04:58.980 --> 00:05:01.130
different sequences
of instructions

00:05:01.130 --> 00:05:05.360
if, for example, a particular
data value was zero.

00:05:05.360 --> 00:05:08.410
The datapath serves as the
brawn of our digital system

00:05:08.410 --> 00:05:12.340
and is responsible for storing
and manipulating data values.

00:05:12.340 --> 00:05:14.680
The control unit serves as
the brain of our system,

00:05:14.680 --> 00:05:17.240
interpreting the program
stored in main memory

00:05:17.240 --> 00:05:19.760
and generating the necessary
sequence of control

00:05:19.760 --> 00:05:22.500
signals for the datapath.

00:05:22.500 --> 00:05:25.460
Instructions are the
fundamental unit of work.

00:05:25.460 --> 00:05:28.410
They’re fetched by the control
unit and executed one after

00:05:28.410 --> 00:05:31.030
another in the order
they are fetched.

00:05:31.030 --> 00:05:33.140
Each instruction
specifies the operation

00:05:33.140 --> 00:05:34.880
to be performed along
with the registers

00:05:34.880 --> 00:05:38.280
to supply the source operands
and destination register where

00:05:38.280 --> 00:05:40.890
the result will be stored.

00:05:40.890 --> 00:05:43.790
In a von Neumann machine,
instruction execution

00:05:43.790 --> 00:05:46.000
involves the steps shown here:

00:05:46.000 --> 00:05:48.250
the instruction is
loaded from the memory

00:05:48.250 --> 00:05:50.930
location whose address is
specified by the program

00:05:50.930 --> 00:05:52.080
counter.

00:05:52.080 --> 00:05:54.610
When the requested data
is returned by the memory,

00:05:54.610 --> 00:05:56.460
the instruction
fields are converted

00:05:56.460 --> 00:05:59.310
to the appropriate control
signals for the datapath,

00:05:59.310 --> 00:06:02.610
selecting the source operands
from the specified registers,

00:06:02.610 --> 00:06:05.680
directing the ALU to perform
the specified operation,

00:06:05.680 --> 00:06:10.090
and storing the result in the
specified destination register.

00:06:10.090 --> 00:06:12.120
The final step in
executing an instruction

00:06:12.120 --> 00:06:13.940
is updating the
value of the program

00:06:13.940 --> 00:06:17.650
counter to be the address
of the next instruction.

00:06:17.650 --> 00:06:21.140
This execution loop is
performed again and again.

00:06:21.140 --> 00:06:23.380
Modern machines can execute
up more than a billion

00:06:23.380 --> 00:06:25.700
instructions per second!

00:06:25.700 --> 00:06:28.690
The discussion so far
has been a bit abstract.

00:06:28.690 --> 00:06:31.100
Now it’s time to roll up our
sleeves and figure out what

00:06:31.100 --> 00:06:34.090
instructions we want
our system to support.

00:06:34.090 --> 00:06:36.920
The specification of instruction
fields and their meaning

00:06:36.920 --> 00:06:39.290
along with the details
of the datapath design

00:06:39.290 --> 00:06:42.540
are collectively called the
instruction set architecture

00:06:42.540 --> 00:06:45.010
(ISA) of the system.

00:06:45.010 --> 00:06:48.390
The ISA is a detailed functional
specification of the operations

00:06:48.390 --> 00:06:50.470
and storage
mechanisms and serves

00:06:50.470 --> 00:06:52.430
as a contract
between the designers

00:06:52.430 --> 00:06:54.850
of the digital hardware
and the programmers who

00:06:54.850 --> 00:06:57.170
will write the programs.

00:06:57.170 --> 00:06:59.880
Since the programs are stored
in main memory and can hence be

00:06:59.880 --> 00:07:02.130
changed, we’ll
call them software,

00:07:02.130 --> 00:07:04.450
to distinguish them from
the digital logic which,

00:07:04.450 --> 00:07:06.900
once implemented,
doesn’t change.

00:07:06.900 --> 00:07:10.190
It’s the combination of hardware
and software that determine

00:07:10.190 --> 00:07:12.950
the behavior of our system.

00:07:12.950 --> 00:07:15.670
The ISA is a new
layer of abstraction:

00:07:15.670 --> 00:07:17.753
we can write programs
for the system

00:07:17.753 --> 00:07:19.170
without knowing
the implementation

00:07:19.170 --> 00:07:21.460
details of the hardware.

00:07:21.460 --> 00:07:23.940
As hardware
technology improves we

00:07:23.940 --> 00:07:28.180
can build faster systems without
having to change the software.

00:07:28.180 --> 00:07:31.840
You can see here that over
a fifteen year timespan,

00:07:31.840 --> 00:07:35.530
the hardware for executing
the Intel x86 instruction set

00:07:35.530 --> 00:07:38.750
went from executing 300,000
instructions per second

00:07:38.750 --> 00:07:42.400
to executing 5 billion
instructions per second.

00:07:42.400 --> 00:07:46.280
Same software as before, we’ve
just taken advantage of smaller

00:07:46.280 --> 00:07:50.150
and faster MOSFETs to build
more complex circuits and faster

00:07:50.150 --> 00:07:52.330
execution engines.

00:07:52.330 --> 00:07:54.780
But a word of
caution is in order!

00:07:54.780 --> 00:07:58.130
It’s tempting to make choices
in the ISA that reflect

00:07:58.130 --> 00:08:00.970
the constraints of current
technologies, e.g.,

00:08:00.970 --> 00:08:02.890
the number of
internal registers,

00:08:02.890 --> 00:08:05.590
the width of the operands,
or the maximum size of main

00:08:05.590 --> 00:08:07.180
memory.

00:08:07.180 --> 00:08:10.150
But it will be hard to change
the ISA when technology

00:08:10.150 --> 00:08:13.440
improves since there’s a
powerful economic incentive

00:08:13.440 --> 00:08:16.890
to ensure that old software
can run on new machines,

00:08:16.890 --> 00:08:20.520
which means that a particular
ISA can live for decades

00:08:20.520 --> 00:08:23.700
and span many generations
of technology.

00:08:23.700 --> 00:08:26.550
If your ISA is successful,
you’ll have to live with any

00:08:26.550 --> 00:08:30.320
bad choices you made
for a very long time.

00:08:30.320 --> 00:08:32.860
Designing an ISA is hard!

00:08:32.860 --> 00:08:35.350
What are the operations
that should be supported?

00:08:35.350 --> 00:08:37.030
How many internal registers?

00:08:37.030 --> 00:08:38.659
How much main memory?

00:08:38.659 --> 00:08:40.390
Should we design the
instruction encoding

00:08:40.390 --> 00:08:44.110
to minimize program size or to
keep the logic in the control

00:08:44.110 --> 00:08:45.970
unit as simple as possible?

00:08:45.970 --> 00:08:47.450
Looking into our
crystal ball, what

00:08:47.450 --> 00:08:50.320
can we say about the computation
and storage capabilities

00:08:50.320 --> 00:08:53.070
of future technologies?

00:08:53.070 --> 00:08:56.580
We’ll answer these questions by
taking a quantitative approach.

00:08:56.580 --> 00:08:59.060
First we’ll choose a set
of benchmark programs,

00:08:59.060 --> 00:09:02.310
chosen as representative of
the many types of programs we

00:09:02.310 --> 00:09:04.750
expect to run on our system.

00:09:04.750 --> 00:09:06.440
So some of benchmark
programs will

00:09:06.440 --> 00:09:09.170
perform scientific and
engineering computations,

00:09:09.170 --> 00:09:11.470
some will manipulate
large data sets

00:09:11.470 --> 00:09:13.910
or perform database
operations, some

00:09:13.910 --> 00:09:16.860
will require specialized
computations for graphics

00:09:16.860 --> 00:09:19.500
or communications, and so on.

00:09:19.500 --> 00:09:21.970
Happily, after many
decades of computer use,

00:09:21.970 --> 00:09:24.070
several standardized
benchmark suites

00:09:24.070 --> 00:09:26.200
are available for us to use.

00:09:26.200 --> 00:09:28.550
We’ll then implement the
benchmark programs using

00:09:28.550 --> 00:09:32.210
our instruction set and simulate
their execution on our proposed

00:09:32.210 --> 00:09:33.610
datapath.

00:09:33.610 --> 00:09:36.690
We’ll evaluate the results to
measure how well the system

00:09:36.690 --> 00:09:38.120
performs.

00:09:38.120 --> 00:09:40.550
But what do we mean by “well”?

00:09:40.550 --> 00:09:42.100
That’s where it
gets interesting:

00:09:42.100 --> 00:09:45.500
“well” could refer to execution
speed, energy consumption,

00:09:45.500 --> 00:09:49.170
circuit size, system cost, etc.

00:09:49.170 --> 00:09:50.720
If you’re designing
a smart watch,

00:09:50.720 --> 00:09:53.090
you’ll make different choices
than if you’re designing

00:09:53.090 --> 00:09:57.260
a high-performance graphics
card or a data-center server.

00:09:57.260 --> 00:10:00.340
Whatever metric you choose to
evaluate your proposed system,

00:10:00.340 --> 00:10:03.170
there’s an important design
principle we can follow:

00:10:03.170 --> 00:10:06.060
identify the common
operations and focus on them

00:10:06.060 --> 00:10:08.190
as you optimize your design.

00:10:08.190 --> 00:10:10.240
For example, in
general-purpose computing,

00:10:10.240 --> 00:10:12.840
almost all programs
spend a lot of their time

00:10:12.840 --> 00:10:15.960
on simple arithmetic
operations and accessing values

00:10:15.960 --> 00:10:17.320
in main memory.

00:10:17.320 --> 00:10:19.780
So those operations
should be made as fast

00:10:19.780 --> 00:10:22.370
and energy efficient
as possible.

00:10:22.370 --> 00:10:25.120
Now, let’s get to work designing
our own instruction set

00:10:25.120 --> 00:10:29.210
and execution engine, a
system we’ll call the Beta.