WEBVTT

00:00:00.500 --> 00:00:02.890
For this problem,
assume that you

00:00:02.890 --> 00:00:05.310
have discovered a
room full of discarded

00:00:05.310 --> 00:00:07.460
5-stage pipelined betas.

00:00:07.460 --> 00:00:10.320
These betas fall
into four categories.

00:00:10.320 --> 00:00:13.460
The first is completely
functional 5-stage Betas

00:00:13.460 --> 00:00:16.490
with full bypass
and annulment logic.

00:00:16.490 --> 00:00:19.960
The second are betas
with a bad register file.

00:00:19.960 --> 00:00:23.560
In these betas, all data read
directly from the register file

00:00:23.560 --> 00:00:24.960
is zero.

00:00:24.960 --> 00:00:27.840
Note that if the data is
read from a bypass path,

00:00:27.840 --> 00:00:30.730
then the correct
value will be read.

00:00:30.730 --> 00:00:34.400
The third set are betas
without bypass paths.

00:00:34.400 --> 00:00:37.260
And finally, the fourth
are betas without annulment

00:00:37.260 --> 00:00:39.330
of branch delay slots.

00:00:39.330 --> 00:00:41.710
The problem is that the
betas are not labeled,

00:00:41.710 --> 00:00:46.450
so we do not know which
falls into which category.

00:00:46.450 --> 00:00:49.140
You come up with the
test program shown here.

00:00:49.140 --> 00:00:51.760
Your plan is to single
step through the program

00:00:51.760 --> 00:00:54.570
using each Beta chip,
carefully noting

00:00:54.570 --> 00:00:58.340
the address that the final
JMP loads into the PC.

00:00:58.340 --> 00:01:01.090
Your goal is to determine
which of the four classes

00:01:01.090 --> 00:01:05.080
each chip falls into
via this JMP address.

00:01:05.080 --> 00:01:07.610
Notice that on a
fully functional beta,

00:01:07.610 --> 00:01:11.060
this code would execute the
instructions sequentially

00:01:11.060 --> 00:01:12.670
skipping the MULC instruction.

00:01:15.560 --> 00:01:17.880
Here we see a pipeline
diagram showing

00:01:17.880 --> 00:01:21.040
execution of this program
on a fully functional beta

00:01:21.040 --> 00:01:23.610
from category C1.

00:01:23.610 --> 00:01:26.620
It shows that the although the
MULC instruction is fetched

00:01:26.620 --> 00:01:31.730
in cycle 2, it gets annulled
when the BEQ is in the RF stage

00:01:31.730 --> 00:01:35.280
and it determines that the
branch to label X, or the SUBC

00:01:35.280 --> 00:01:37.870
instruction, will be taken.

00:01:37.870 --> 00:01:43.280
The MULC is annulled by
inserting a NOP in its place.

00:01:43.280 --> 00:01:45.500
The ADDC and BEQ
instructions read R31

00:01:45.500 --> 00:01:48.440
from the register file.

00:01:48.440 --> 00:01:50.840
The SUBC, however,
gets the value

00:01:50.840 --> 00:01:54.550
of R2 via the bypass path
from the BEQ instruction

00:01:54.550 --> 00:01:57.120
which is in the MEM stage.

00:01:57.120 --> 00:02:00.840
The ADD then reads R0 and R2.

00:02:00.840 --> 00:02:03.390
R0 has already made it
back to the register file

00:02:03.390 --> 00:02:05.690
because the ADDC
instruction completed

00:02:05.690 --> 00:02:07.980
by the end of cycle 4.

00:02:07.980 --> 00:02:10.910
R2, however, is read
via the bypass path

00:02:10.910 --> 00:02:15.290
from the SUBC instruction
which is in the ALU stage.

00:02:15.290 --> 00:02:17.460
Finally, the JMP,
reads the value

00:02:17.460 --> 00:02:21.400
of R3 via the bypass path
from the ADD instruction

00:02:21.400 --> 00:02:25.540
which is in the ALU
stage in cycle 6.

00:02:25.540 --> 00:02:29.040
When run on a fully functional
beta with bypass paths

00:02:29.040 --> 00:02:31.200
and annulment of
branch delay slots,

00:02:31.200 --> 00:02:34.380
the code behaves as follows:

00:02:34.380 --> 00:02:37.530
The ADDC sets R0 = 4.

00:02:37.530 --> 00:02:40.610
The BEQ stores PC + 4 into R2.

00:02:40.610 --> 00:02:46.220
Since the ADDC is at address
0, the BEQ is at address 4,

00:02:46.220 --> 00:02:51.520
so PC + 4 = 8 is stored into
R2, and the program branches

00:02:51.520 --> 00:02:53.430
to label X.

00:02:53.430 --> 00:02:57.390
Next, the SUBC subtracts 4
from the latest value of R2

00:02:57.390 --> 00:03:00.940
and stores the result
which is 4 back into R2.

00:03:00.940 --> 00:03:08.240
The ADD adds R0 and R2, or 4
and 4, and stores the result

00:03:08.240 --> 00:03:09.560
which is 8 into R3.

00:03:09.560 --> 00:03:17.230
The JMP jumps to the
address in R3 which is 8.

00:03:17.230 --> 00:03:20.390
When run on C2 which has
a bad register file that

00:03:20.390 --> 00:03:23.740
always outputs a zero, the
behavior of the program

00:03:23.740 --> 00:03:26.430
changes a bit.

00:03:26.430 --> 00:03:27.940
The ADDC and BEQ
instructions which

00:03:27.940 --> 00:03:32.040
use R31 which is 0 anyways
behave in the same way

00:03:32.040 --> 00:03:33.810
as before.

00:03:33.810 --> 00:03:37.990
The SUBC, which gets the value
of R2 from the bypass path,

00:03:37.990 --> 00:03:41.790
reads the correct value
for R2 which is 8.

00:03:41.790 --> 00:03:45.130
Recall that only reads directly
from the register file return

00:03:45.130 --> 00:03:49.160
a zero, whereas if the data is
coming from a bypass path, then

00:03:49.160 --> 00:03:50.960
you get the correct value.

00:03:50.960 --> 00:03:52.700
So the SUBC produces a 4.

00:03:52.700 --> 00:03:56.940
The ADD reads R0 from
the register file

00:03:56.940 --> 00:03:59.600
and R2 from the bypass path.

00:03:59.600 --> 00:04:01.810
The result is that
the value of R0

00:04:01.810 --> 00:04:07.690
is read as if it was 0 while
that of R2 is correct and is 4.

00:04:07.690 --> 00:04:10.930
So R3 gets the value
4 assigned to it,

00:04:10.930 --> 00:04:13.830
and that is the address that
the JMP instruction jumps

00:04:13.830 --> 00:04:20.190
to because it too reads its
register from a bypass path.

00:04:20.190 --> 00:04:24.300
When run on C3 which does
not have any bypass paths,

00:04:24.300 --> 00:04:27.100
some of the instructions
will read stale values

00:04:27.100 --> 00:04:29.030
of their source operands.

00:04:29.030 --> 00:04:31.670
Let’s go through the
example in detail.

00:04:31.670 --> 00:04:35.400
The ADDC and BEQ
instructions read

00:04:35.400 --> 00:04:39.810
R31 which is 0 from the register
file so what they ultimately

00:04:39.810 --> 00:04:41.820
produce does not change.

00:04:41.820 --> 00:04:44.760
However, you must keep
in mind that the updated

00:04:44.760 --> 00:04:47.000
value of the
destination register

00:04:47.000 --> 00:04:51.100
will not get updated until
after that instruction completes

00:04:51.100 --> 00:04:54.430
the WB stage of the pipeline.

00:04:54.430 --> 00:04:58.470
When the SUBC reads R2, it
gets a stale value of R2

00:04:58.470 --> 00:05:02.030
because the BEQ instruction
has not yet completed,

00:05:02.030 --> 00:05:05.080
so it assumes that R2 = 0.

00:05:05.080 --> 00:05:10.830
It then subtracts 4 from that
and tries to write -4 into R2.

00:05:10.830 --> 00:05:12.980
Next the ADD runs.

00:05:12.980 --> 00:05:14.920
Recall that the
ADD would normally

00:05:14.920 --> 00:05:18.200
read R0 from the register
file because by the time

00:05:18.200 --> 00:05:21.990
the ADD is in the RF stage,
the ADDC which writes to R0

00:05:21.990 --> 00:05:25.160
has completed all
the pipeline stages.

00:05:25.160 --> 00:05:29.510
The ADD also normally read
R2 from the bypass path.

00:05:29.510 --> 00:05:33.200
However, since C3 does
not have bypass paths,

00:05:33.200 --> 00:05:36.920
it reads a stale value of
R2 from the register file.

00:05:36.920 --> 00:05:39.440
To determine which
stale value it reads,

00:05:39.440 --> 00:05:42.210
we need to examine
the pipeline diagram

00:05:42.210 --> 00:05:46.880
to see if either the BEQ or the
SUBC operations, both of which

00:05:46.880 --> 00:05:49.590
eventually update
R2, have completed

00:05:49.590 --> 00:05:52.090
by the time the ADD reads
its source operands.

00:05:55.130 --> 00:05:57.500
Looking at our
pipeline diagram, we

00:05:57.500 --> 00:06:00.210
see that when the ADD
is in the RF stage,

00:06:00.210 --> 00:06:03.770
neither the BEQ nor the
SUBC have completed,

00:06:03.770 --> 00:06:08.210
thus the value read for R2 is
the initial value of R2 which

00:06:08.210 --> 00:06:08.710
is 0.

00:06:08.710 --> 00:06:13.670
We can now determine the
behavior of the ADD instruction

00:06:13.670 --> 00:06:17.510
which is that it assumes
R0 = 4 and R2 = 0

00:06:17.510 --> 00:06:22.220
and will eventually
write a 4 into R3.

00:06:22.220 --> 00:06:24.310
Finally, the JMP
instruction wants

00:06:24.310 --> 00:06:26.780
to read the result
of the ADD, however,

00:06:26.780 --> 00:06:28.820
since there are no
bypass paths, it

00:06:28.820 --> 00:06:31.940
reads the original
value of R3 which was 0

00:06:31.940 --> 00:06:34.440
and jumps to address 0.

00:06:34.440 --> 00:06:37.530
Of course, had we looked at
our code a little more closely,

00:06:37.530 --> 00:06:39.450
we could have determined
this without all

00:06:39.450 --> 00:06:43.180
the intermediate steps because
the ADD is the only instruction

00:06:43.180 --> 00:06:45.850
that tries to update
R3, and you know

00:06:45.850 --> 00:06:49.400
that the JMP would normally get
R3 from the bypass path which

00:06:49.400 --> 00:06:53.200
is not available, therefore it
must read the original value

00:06:53.200 --> 00:06:57.520
of R3 which is 0.

00:06:57.520 --> 00:07:01.550
In category C4, the betas do
not annul instructions that were

00:07:01.550 --> 00:07:04.430
fetched after a branch
instruction but aren’t supposed

00:07:04.430 --> 00:07:05.990
to get executed.

00:07:05.990 --> 00:07:07.910
This means that
the MULC which is

00:07:07.910 --> 00:07:10.640
fetched after the
BEQ is actually

00:07:10.640 --> 00:07:14.850
executed in the pipeline
and affects the value of R2.

00:07:14.850 --> 00:07:19.230
Let’s take a close look at what
happens in each instruction.

00:07:19.230 --> 00:07:22.570
Once again we begin with
the ADDC setting R0 to 4

00:07:22.570 --> 00:07:25.870
and the BEQ setting R2 to 8.

00:07:25.870 --> 00:07:28.130
Since our bypass
paths are now working,

00:07:28.130 --> 00:07:29.990
we can assume that
we can immediately

00:07:29.990 --> 00:07:32.260
get the updated value.

00:07:32.260 --> 00:07:36.170
Next, we execute the MULC which
takes the latest value of R2

00:07:36.170 --> 00:07:40.070
from the bypass path
and multiplies it by 2.

00:07:40.070 --> 00:07:42.300
So it sets R2 = 16.

00:07:42.300 --> 00:07:48.440
The SUBC now uses this value
for R2 from which it subtracts 4

00:07:48.440 --> 00:07:52.060
to produce R2 = 12.

00:07:52.060 --> 00:07:56.740
The ADD then reads R0 =
4 and adds to it R2 = 12

00:07:56.740 --> 00:08:02.300
to produce R3 = 16.

00:08:02.300 --> 00:08:04.750
Finally, the JMP
jumps to address 16.

00:08:04.750 --> 00:08:08.480
Since each of the
four categories

00:08:08.480 --> 00:08:11.790
produces a unique jump
address, this program

00:08:11.790 --> 00:08:16.630
can be used to sort out all the
betas into the four categories.