This is a closed book, closed notes exam.
80 Minutes
15 Pages

Notes:
• Not all questions are of equal difficulty, so look over the entire exam and budget your time carefully.
• Please carefully state any assumptions you make.
• Please write your name on every page in the quiz.
• You must not discuss a quiz's contents with other students who have not yet taken the quiz.

Writing name on each sheet ________ 2 Points
Part A ________ 22 Points
Part B ________ 12 Points
Part C ________ 12 Points
Part D ________ 32 Points

TOTAL ________ 80 Points
Part A: Addressing Modes on MIPS ISA (22 points)

Ben Bitdiddle is suspicious of the benefits of complex addressing modes. So he has decided to investigate it by incrementally removing the addressing modes from our MIPS ISA. Then he will write programs on the “crippled” MIPS ISAs to see what the programming on these ISAs is like.

For this problem, we assume 18-bit address space so that we can access any location in the memory by the 16-bit immediate field encoded in an instruction. (Remember that all data and instruction words are aligned. Don’t worry about byte or half-word data accesses.)

Please refer to the MIPS instruction table on the last page (Appendix B) for each instruction’s description and encoding.
Question 1 (6 points)

As a first step, Ben has discontinued supporting the displacement (base+offset) addressing mode; that is, our MIPS ISA only supports register indirect addressing (without the offset).

Can you still write the same program as before? If so, please translate the following load instruction into an instruction sequence in the new ISA. If not, explain why.

\[
\text{LW R1, } 16(R2)
\]

Question 2 (8 points)

Now he wants to take a bolder step by completely eliminating the register indirect addressing. The new load and store instructions will have the following format:

\[
\begin{align*}
\text{LW R1, imm16} &\; ; \; R1 \leftarrow M[{\text{imm16,00}}_{2}] \\
\text{SW R1, imm16} &\; ; \; M[{\text{imm16,00}}_{2}] \leftarrow R1
\end{align*}
\]

Can you still write the same program as before? If so, please translate the following load instruction into an instruction sequence in the new ISA. If not, explain why.

(Don’t worry about branches and jumps for this question.)

\[
\text{LW R1, } 16(R2)
\]
**Question 3 (8 points)**

Ben is wondering whether we can implement a subroutine only using absolute addressing. He changes the original ISA such that all the branches and jumps take a 16-bit absolute address, and that `jr` and `jalr` are not supported any longer.

With the new ISA he decides to rewrite a piece of subroutine code from his old project. Here is the original C code he has written.

```c
int b; // a global variable

void multiplyByB(int a) {
    int i, result;
    for (i = 0; i < b; i++) {
        result = result + a;
    }
}
```

The C code above is translated into the following instruction sequence on our original MIPS ISA. Assume that upon entry, R1 and R2 contain `b` and `a`, respectively. R3 is used for `i`, and R4 for `result`. By a calling convention, the 16-bit word-aligned return address is passed in R31.

```
Subroutine: xor R4, R4, R4 ; result = 0
            xor R3, R3, R3 ; i = 0
loop:     slt R5, R3, R1
          bnez R5, L1 ; if (i < b) goto L1
return:   jr R31 ; return to the caller
L1:        add R4, R4, R2 ; result += a
          addi R3, R3, #1 ; i++
          j loop
```

If you can, please rewrite the assembly code so that the subroutine returns without using a `jr` instruction (which is a register indirect jump). If you cannot, explain why.
**Part B: Microprogramming (12 points)**

In this question we ask you to implement a special return instruction, *return on zero* (\texttt{retz}), which uses the same encoding as a conditional branch instruction on MIPS:

\[
\begin{array}{cccc}
6 & 5 & 5 & 16 \\
\text{\texttt{retz}} & \text{Rs} & \text{Rt} & \text{unused}
\end{array}
\]

\texttt{retz} instruction provides fast return from a subroutine call using \texttt{Rt} as the stack pointer. The instruction first tests the value of register \texttt{Rs}. If it is \texttt{not} zero, simply proceed to the next instruction at \texttt{PC+4}. If it is zero, the instruction does the following: (1) it reads the return address from memory at the address in register \texttt{Rt}, (2) increments \texttt{Rt} by 4, and (3) jumps to the return address.

For reference, we have included the actual bus-based datapath in Appendix A (Page 14) and a MIPS instruction table in Appendix B (Page 15). You do not need this information if you remember the bus-based architecture from the online material. Please detach the last two pages from the exam and use them as a reference while you answer this question.

**Question 4 (12 points)**

Fill out Worksheet 1 for \texttt{retz} instruction. You should try to optimize your implementation for the minimal number of cycles necessary and for which signals can be set to don’t-cares. You do not have to worry about the busy signal. You may not need all the lines in the table for your solution.

You are allowed to introduce at most one new \mu\text{Br} target (Next State) for J (Jump) or Z (branch-if-Zero) other than \texttt{FETCH0}.
<table>
<thead>
<tr>
<th>State</th>
<th>PseudoCode</th>
<th>Ld</th>
<th>IR</th>
<th>Reg Sel</th>
<th>Reg W</th>
<th>en Reg</th>
<th>Id A</th>
<th>Id B</th>
<th>ALUOp</th>
<th>Mem W</th>
<th>en Mem</th>
<th>Ex Sel</th>
<th>en Im m</th>
<th>µBr</th>
<th>Next State</th>
</tr>
</thead>
<tbody>
<tr>
<td>FETCH0:</td>
<td>MA &lt;- PC; A &lt;- PC</td>
<td>*</td>
<td>PC</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>*</td>
<td>*</td>
<td>0</td>
<td>1</td>
<td>*</td>
<td>0</td>
<td>*</td>
<td>0</td>
<td>N</td>
</tr>
<tr>
<td>IR &lt;- Mem</td>
<td></td>
<td>1</td>
<td>*</td>
<td>*</td>
<td>0</td>
<td>0</td>
<td>*</td>
<td>*</td>
<td>0</td>
<td>*</td>
<td>0</td>
<td>1</td>
<td>*</td>
<td>0</td>
<td>N</td>
</tr>
<tr>
<td>PC &lt;- A+4; B &lt;- A+4</td>
<td></td>
<td>0</td>
<td>PC</td>
<td>1</td>
<td>1</td>
<td>*</td>
<td>1</td>
<td>INC_A_4</td>
<td>1</td>
<td>*</td>
<td>0</td>
<td>*</td>
<td>0</td>
<td>D</td>
<td>*</td>
</tr>
<tr>
<td>...</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>NOP0:</td>
<td>microbranch back to FETCH0</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>0</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>0</td>
<td>*</td>
<td>*</td>
<td>0</td>
<td>*</td>
<td>0</td>
<td>J FETCH0</td>
</tr>
<tr>
<td>retz0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Worksheet 1 for Question 4
Part C: Fully-Bypassed Simple 5-stage Pipeline (12 points)

In Lecture 6, we have introduced a fully bypassed 5-stage MIPS pipeline. We reproduce the pipeline diagram and the symbol definitions used in the stall and bypass conditions below.

Subscripts D, E, M, and W denote instruction decode, execute, memory, and write back stages, respectively.

\[
\begin{align*}
ws &= \text{Case opcode} \\
    &\begin{cases}
    \text{ALU} &\Rightarrow rd \\
    \text{ALUi}, \text{LW} &\Rightarrow rt \\
    \text{JAL}, \text{JALR} &\Rightarrow R31 \\
    \end{cases} \\
we &= \text{Case opcode} \\
    &\begin{cases}
    \text{ALU}, \text{ALUi}, \text{LW} &\Rightarrow (ws \neq 0) \\
    \text{JAL}, \text{JALR} &\Rightarrow on \\
    \end{cases} \\
we-bypass_E &= \text{Case opcode}_E \\
    &\begin{cases}
    \text{ALU}, \text{ALUi} &\Rightarrow (ws \neq 0) \\
    \end{cases} \\
we-stall_E &= \text{Case opcode}_E \\
    &\begin{cases}
    \text{LW} &\Rightarrow (ws \neq 0) \\
    \text{JAL}, \text{JALR} &\Rightarrow on \\
    \end{cases}
\end{align*}
\]

re1 = Case opcode
\[
\begin{align*}
&\begin{cases}
    \text{ALU, ALUi, LW} &\Rightarrow \text{on} \\
    \text{JAL, JALR} &\Rightarrow \text{off} \\
    \end{cases} \\
re2 = \text{Case opcode} \\
    &\begin{cases}
    \text{ALU, SW} &\Rightarrow \text{on} \\
    \end{cases} \\
we-bypass_E &= \text{Case opcode}_E \\
    &\begin{cases}
    \text{ALU, ALUi} &\Rightarrow (ws \neq 0) \\
    \end{cases} \\
we-stall_E &= \text{Case opcode}_E \\
    &\begin{cases}
    \text{LW} &\Rightarrow (ws \neq 0) \\
    \text{JAL, JALR} &\Rightarrow on \\
    \end{cases}
\end{align*}
\]
**Question 5 (8 points)**

In Lecture L6, we gave you an example of bypass signal (ASrc) from EX stage to ID stage. In the fully bypassed pipeline, however, the mux control signals become more complex, because we have more inputs to the muxes in ID stage.

Write down the bypass condition for the path between M (Memory) -> D (Decode) stages into register B. (The path is shown with a dotted line in the figure.)

Bypass \textit{MEM->ID(B)} =

**Question 6 (4 points)**

Please write down an instruction sequence (with fewer than 5 instructions) which activates the bypass logic in Question 5.
Part D: Princeton Architecture (32 points)

Unlike Harvard-style (separate instruction and data memories) architectures, Princeton-style machines have a shared instruction and data memory. In order to reduce the memory cost, Ben Bitdiddle has proposed the following two-stage Princeton-style MIPS pipeline to replace a single-cycle Harvard-style pipeline in our lectures.

Every instruction takes exactly two cycles to execute (i.e. instruction fetch and execute), and there is no overlap between two sequential instructions; that is, fetching an instruction occurs in the cycle following the previous instruction’s execution (no pipelining).

Assume that the new pipeline does not contain a branch delay slot. Also, don’t worry about self-modifying code for now.

Figure D-1. Baseline Two-stage Princeton-style MIPS Pipeline
**Question 7 (8 points)**

Please complete the following control signals. You are allowed to use any internal signals (e.g. OpCode, PC, IR, zero?, rd1, data, etc.) but not other control signals (ExtSel, IRSrc, PCSrc, etc.).

*Example syntax:* \( \text{PCEn} = (\text{OpCode} == \text{ALUOp}) \text{ or } ((\text{ALU.zero?}) \text{ and } (\text{not} (\text{PC} == 17))) \)

You may also use the variable \( S \) which indicates the pipeline’s operation phase at a given time.

\[
S := \text{I-Fetch} \mid \text{Execute} \text{ (toggles every cycle)}
\]

\[
\text{PCEn} =
\]

\[
\text{IREn} =
\]

\[
\text{AddrSrc} = \text{Case} \______________
\]

\[
\______________ \Rightarrow \text{PC}
\]

\[
\______________ \Rightarrow \text{ALU}
\]
**Question 8 (8 points)**

After having implemented his proposed architecture, Ben has observed that a lot of datapath is not in use because only one phase (either I-Fetch or Execute) is active at any given time. So he has decided to fetch the next instruction during the Execute phase of the previous instruction.

![Modified Two-stage Princeton-style MIPS Pipeline](image)

Figure D-2. Modified Two-stage Princeton-style MIPS Pipeline

Do we need to stall this pipeline? If so, for each cause (1) write down the cause in one sentence, and (2) give an example instruction sequence. If not, explain why. (Remember there is no delay slot.)
Question 9 (8 points)

Please complete the following control signals in the modified pipeline (Question Y). As before, you are allowed to use any internal signals (e.g. OpCode, PC, IR, zero?, rd1, data, etc.) but not other control signals (ExtSel, IRSrc, PCSrc, etc.)

PCEnable =

<table>
<thead>
<tr>
<th>AddrSrc = Case</th>
<th>=&gt; PC</th>
<th>=&gt; ALU</th>
</tr>
</thead>
<tbody>
<tr>
<td>_____________</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>IRSrc = Case</th>
<th>=&gt; nop</th>
<th>=&gt; Mem</th>
</tr>
</thead>
<tbody>
<tr>
<td>___________</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Name ____________________________
Question 10 (8 points)

Suppose we allow self-modifying code to execute, i.e. store instructions can write to the portion of memory that contains executable code. Does the two-stage Princeton pipeline need to be modified to support such self-modifying code? If so, please indicate how. You may use the diagram below to draw modifications to the datapath. If you think no modifications are required, explain why.
Appendix A. A Cheat Sheet for the Bus-based MIPS Implementation

Remember that you can use the following ALU operations:

<table>
<thead>
<tr>
<th>ALUOp</th>
<th>ALU Result Output</th>
</tr>
</thead>
<tbody>
<tr>
<td>COPY_A</td>
<td>A</td>
</tr>
<tr>
<td>COPY_B</td>
<td>B</td>
</tr>
<tr>
<td>INC_A_1</td>
<td>A+1</td>
</tr>
<tr>
<td>DEC_A_1</td>
<td>A-1</td>
</tr>
<tr>
<td>INC_A_4</td>
<td>A+4</td>
</tr>
<tr>
<td>DEC_A_4</td>
<td>A-4</td>
</tr>
<tr>
<td>ADD</td>
<td>A+B</td>
</tr>
<tr>
<td>SUB</td>
<td>A-B</td>
</tr>
</tbody>
</table>

Table H5-2: Available ALU operations

Also remember that \( \mu \text{Br} \) (microbranch) column in Table H5-3 represents a 2-bit field with four possible values: N, J, Z, and D. If \( \mu \text{Br} \) is N (next), then the next state is simply \((\text{current state} + 1)\). If it is J (jump), then the next state is unconditionally the state specified in the Next State column (i.e., it’s an unconditional microbranch). If it is Z (branch-if-zero), then the next state depends on the value of the ALU’s zero output signal (i.e., it’s a conditional microbranch). If zero is asserted (\( == 1 \)), then the next state is that specified in the Next State column, otherwise, it is \((\text{current state} + 1)\). If \( \mu \text{Br} \) is D (dispatch), then the FSM looks at the opcode and function fields in the IR and goes into the corresponding state.
## Appendix B. 6.823 MIPS Instruction Table

### Category | Instruction | Usage (Example) | Meaning | Encoding Format* |
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Arithmetic</td>
<td>add</td>
<td>add Rd, Rs, Rt</td>
<td>Rd = Rs + Rt</td>
<td>R-format</td>
</tr>
<tr>
<td></td>
<td>subtract</td>
<td>sub Rd, Rs, Rt</td>
<td>Rd = Rs - Rt</td>
<td>R-format</td>
</tr>
<tr>
<td></td>
<td>add immediate</td>
<td>(addi Rt, Rs, 1)</td>
<td>(Rt = Rs + 1)</td>
<td>I-format</td>
</tr>
<tr>
<td></td>
<td>add unsigned</td>
<td>addu Rd, Rs, Rt</td>
<td>Rd = Rs + Rt</td>
<td>R-format</td>
</tr>
<tr>
<td></td>
<td>subtract unsigned</td>
<td>subu Rd, Rs, Rt</td>
<td>Rd = Rs - Rt</td>
<td>R-format</td>
</tr>
<tr>
<td></td>
<td>add immed unsigned</td>
<td>(addiu Rt, Rs, 1)</td>
<td>(Rt = Rs + 1)</td>
<td>I-format</td>
</tr>
<tr>
<td>Logical</td>
<td>and</td>
<td>and Rd, Rs, Rt</td>
<td>Rd = Rs &amp; Rt</td>
<td>R-format</td>
</tr>
<tr>
<td></td>
<td>or</td>
<td>or Rd, Rs, Rt</td>
<td>Rd = Rs</td>
<td>R</td>
</tr>
<tr>
<td></td>
<td>and immed</td>
<td>(andi Rt, Rs, 100)</td>
<td>(Rt = Rs</td>
<td>100)</td>
</tr>
<tr>
<td></td>
<td>or immed</td>
<td>(ori Rt, Rs, 100)</td>
<td>(Rt = Rs</td>
<td>100)</td>
</tr>
<tr>
<td></td>
<td>shift left logical**</td>
<td>(sll Rt, Rs, 10)</td>
<td>(rt = rs&lt;&lt;10)</td>
<td>I-format</td>
</tr>
<tr>
<td></td>
<td>shift right logical**</td>
<td>(sr Rs, Rs, 10)</td>
<td>(rt = rs&gt;&gt;10)</td>
<td>I-format</td>
</tr>
<tr>
<td>Data transfer</td>
<td>load word</td>
<td>lw Rt, 10(Rs)</td>
<td>Mem[Rt+100]</td>
<td>I-format</td>
</tr>
<tr>
<td></td>
<td>store word</td>
<td>sw Rt, 100(Rs)</td>
<td>Mem[Rt+100]=Rt</td>
<td>I-format</td>
</tr>
<tr>
<td></td>
<td>load upper immed</td>
<td>lui Rt, 100</td>
<td>Rt = 100*2^16</td>
<td>I-format</td>
</tr>
<tr>
<td></td>
<td>branch on equal</td>
<td>(beq Rs, Rs, 25)</td>
<td>if(Rs==Rs)goto PC+4+(25&lt;&lt;2)</td>
<td>I-format</td>
</tr>
<tr>
<td></td>
<td>branch on not equal</td>
<td>(bne Rs, Rs, 25)</td>
<td>if(Rs!=Rs)goto PC+4+(25&lt;&lt;2)</td>
<td>I-format</td>
</tr>
<tr>
<td></td>
<td>branch on zero</td>
<td>(beqz Rs, 25)</td>
<td>if(Rs==0)goto PC+4+(25&lt;&lt;2)</td>
<td>I-format</td>
</tr>
<tr>
<td></td>
<td>branch on not zero</td>
<td>(bnez Rs, 25)</td>
<td>if(Rs!=0)goto PC+4+(25&lt;&lt;2)</td>
<td>I-format</td>
</tr>
<tr>
<td></td>
<td>set on less than</td>
<td>slt Rd, Rs, Rt</td>
<td>Rd=(Rs&lt;Rt) ? 1:0</td>
<td>R-format</td>
</tr>
<tr>
<td></td>
<td>set less than immed</td>
<td>(slti Rt, Rs, 100)</td>
<td>Rt=(Rs&lt;100) ? 1:0</td>
<td>I-format</td>
</tr>
<tr>
<td></td>
<td>set less than unsigned</td>
<td>sltu Rd, Rs, Rt</td>
<td>Rd=(Rs&lt;100) ? 1:0</td>
<td>R-format</td>
</tr>
<tr>
<td></td>
<td>set less than immed unsigned</td>
<td>(sltiu Rt, Rs, 100)</td>
<td>Rt=(Rs&lt;100) ? 1:0</td>
<td>I-format</td>
</tr>
<tr>
<td></td>
<td>Uncond. jump</td>
<td>jump</td>
<td>goto (2500&lt;&lt;2)</td>
<td>J-format</td>
</tr>
<tr>
<td></td>
<td>jump register</td>
<td>jr Rs</td>
<td>goto Rs</td>
<td>R-format</td>
</tr>
<tr>
<td></td>
<td>jump and link</td>
<td>jal 2500</td>
<td>R3=R3+PC+4; goto (2500&lt;&lt;2)</td>
<td>J-format</td>
</tr>
</tbody>
</table>

* See the table below.
** Slightly different from the original MIPS encoding

### MIPS instruction encoding formats

<table>
<thead>
<tr>
<th>Name</th>
<th>Fields</th>
<th>Comments</th>
</tr>
</thead>
<tbody>
<tr>
<td>Field size</td>
<td>6 bits</td>
<td>5 bits</td>
</tr>
<tr>
<td>R-format</td>
<td>opcode</td>
<td>rs</td>
</tr>
<tr>
<td>I-format</td>
<td>opcode</td>
<td>rs</td>
</tr>
<tr>
<td>J-format</td>
<td>opcode</td>
<td>target address</td>
</tr>
</tbody>
</table>