1 00:00:00,500 --> 00:00:05,390 Let's start by redrawing and simplifying the Beta data path so that it will be easier to 2 00:00:05,390 --> 00:00:08,340 reason about when we add pipelining. 3 00:00:08,340 --> 00:00:13,690 The first simplification is to focus on sequential execution and so leave out the branch addressing 4 00:00:13,690 --> 00:00:15,660 and PC mux logic. 5 00:00:15,660 --> 00:00:20,560 Our simplified Beta always executes the next instruction from PC+4. 6 00:00:20,560 --> 00:00:25,960 We'll add back the branch and jump logic when we discuss control hazards. 7 00:00:25,960 --> 00:00:31,180 The second simplification is to have the register file appear twice in the diagram so that we 8 00:00:31,180 --> 00:00:35,800 can tease apart the read and write operations that occur at different stages of instruction 9 00:00:35,800 --> 00:00:37,710 execution. 10 00:00:37,710 --> 00:00:42,830 The top Register File shows the combinational read ports, used to when reading the register 11 00:00:42,830 --> 00:00:45,239 operands in the RF stage. 12 00:00:45,239 --> 00:00:49,440 The bottom Register File shows the clocked write port, used to write the result into 13 00:00:49,440 --> 00:00:53,280 the destination register at the end of the WB stage. 14 00:00:53,280 --> 00:00:58,719 Physically, there's only one set of 32 registers, we've just drawn the read and write circuity 15 00:00:58,719 --> 00:01:01,430 as separate components in the diagram. 16 00:01:01,430 --> 00:01:06,790 If we add pipeline registers to the simplified diagram, we see that execution proceeds through 17 00:01:06,790 --> 00:01:09,750 the five stages from top to bottom. 18 00:01:09,750 --> 00:01:15,000 If we consider execution of instruction sequences with no data hazards, information is flowing 19 00:01:15,000 --> 00:01:20,250 down the pipeline and the pipeline will correctly overlap the execution of all the instructions 20 00:01:20,250 --> 00:01:22,220 in the pipeline. 21 00:01:22,220 --> 00:01:27,470 The diagram shows the components needed to implement each of the five stages. 22 00:01:27,470 --> 00:01:32,270 The IF stage contains the program counter and the main memory interface for fetching 23 00:01:32,270 --> 00:01:33,270 instructions. 24 00:01:33,270 --> 00:01:37,610 The RF stage has the register file and operand multiplexers. 25 00:01:37,610 --> 00:01:42,060 The ALU stage uses the operands and computes the result. 26 00:01:42,060 --> 00:01:47,119 The MEM stage handles the memory access for load and store operations. 27 00:01:47,119 --> 00:01:52,520 And the WB stage writes the result into the destination register. 28 00:01:52,520 --> 00:01:57,189 In each clock cycle, each stage does its part in the execution of a particular instruction. 29 00:01:57,189 --> 00:02:02,469 In a given clock cycle, there are five instructions in the pipeline. 30 00:02:02,469 --> 00:02:06,610 Note that data accesses to main memory span almost two clock cycles. 31 00:02:06,610 --> 00:02:11,390 Data accesses are initiated at the beginning of the MEM stage and returning data is only 32 00:02:11,390 --> 00:02:14,730 needed just before the end of the WB stage. 33 00:02:14,730 --> 00:02:20,530 The memory is itself pipelined and can simultaneously finish the access from an earlier instruction 34 00:02:20,530 --> 00:02:24,510 while starting an access for the next instruction. 35 00:02:24,510 --> 00:02:28,640 This simplified diagram isn't showing how the control logic is split across the pipeline 36 00:02:28,640 --> 00:02:29,730 stages. 37 00:02:29,730 --> 00:02:31,200 How does that work? 38 00:02:31,200 --> 00:02:36,040 Note that we've included instruction registers as part of each pipeline stage, so that each 39 00:02:36,040 --> 00:02:40,790 stage can compute the control signals it needs from its instruction register. 40 00:02:40,790 --> 00:02:45,940 The encoded instruction is simply passed from one stage to the next as the instruction flows 41 00:02:45,940 --> 00:02:48,220 through the pipeline.. 42 00:02:48,220 --> 00:02:52,829 Each stage computes its control signals from the opcode field of its instruction register. 43 00:02:52,829 --> 00:02:57,640 The RF stage needs the RA, RB, and literal fields from its instruction register. 44 00:02:57,640 --> 00:03:03,010 And the WB stage needs the RC field from its instruction register. 45 00:03:03,010 --> 00:03:07,459 The required logic is very similar to the unpipelined implementation, it's just been 46 00:03:07,459 --> 00:03:11,530 split up and moved to the appropriate pipeline stage. 47 00:03:11,530 --> 00:03:16,370 We'll see that we will have to add some additional control logic to deal correctly with pipeline 48 00:03:16,370 --> 00:03:18,459 hazards. 49 00:03:18,459 --> 00:03:21,050 Our simplified diagram isn't so simple anymore! 50 00:03:21,050 --> 00:03:26,540 To see how the pipeline works, let's follow along as it executes this sequence of six 51 00:03:26,540 --> 00:03:27,540 instructions. 52 00:03:27,540 --> 00:03:31,320 Note that the instructions are reading and writing from different registers, so there 53 00:03:31,320 --> 00:03:34,020 are no potential data hazards. 54 00:03:34,020 --> 00:03:39,519 And there are no branches and jumps, so there are no potential control hazards. 55 00:03:39,519 --> 00:03:44,129 Since there are no potential hazards, the instruction executions can be overlapped and 56 00:03:44,129 --> 00:03:47,659 their overlapped execution in the pipeline will work correctly. 57 00:03:47,659 --> 00:03:49,940 Okay, here we go! 58 00:03:49,940 --> 00:03:55,620 During cycle 1, the IF stage sends the value from the program counter to main memory to 59 00:03:55,620 --> 00:04:00,160 fetch the first instruction (the green LD instruction), which will be stored in the 60 00:04:00,160 --> 00:04:04,070 RF-stage instruction register at the end of the cycle. 61 00:04:04,070 --> 00:04:10,769 Meanwhile, it's also computing PC+4, which will be the next value of the program counter. 62 00:04:10,769 --> 00:04:16,019 We've colored the next value blue to indicate that it's the address of the blue instruction 63 00:04:16,019 --> 00:04:17,660 in the sequence. 64 00:04:17,660 --> 00:04:21,839 We'll add the appropriately colored label on the right of each pipeline stage to indicate 65 00:04:21,839 --> 00:04:25,500 which instruction the stage is processing. 66 00:04:25,500 --> 00:04:31,530 At the start of cycle 2, we see that values in the PC and instruction registers for the 67 00:04:31,530 --> 00:04:35,650 RF stage now correspond to the green instruction. 68 00:04:35,650 --> 00:04:40,090 During the cycle the register file will be reading the register operands, in this case 69 00:04:40,090 --> 00:04:42,950 R1, which is needed for the green instruction. 70 00:04:42,950 --> 00:04:49,210 Since the green instruction is a LD, ASEL is 0 and BSEL is 1, selecting the appropriate 71 00:04:49,210 --> 00:04:54,710 values to be written into the A and B operand registers at the end of the cycle. 72 00:04:54,710 --> 00:05:00,440 Concurrently, the IF stage is fetching the blue instruction from main memory and computing 73 00:05:00,440 --> 00:05:04,789 an updated PC value for the next cycle. 74 00:05:04,789 --> 00:05:10,760 In cycle 3, the green instruction is now in the ALU stage, where the ALU is adding the 75 00:05:10,760 --> 00:05:16,410 values in its operand registers (in this case the value of R1 and the constant 4) and the 76 00:05:16,410 --> 00:05:21,740 result will be stored in Y_MEM register at the end of the cycle. 77 00:05:21,740 --> 00:05:26,780 In cycle 4, we're overlapping execution of four instructions. 78 00:05:26,780 --> 00:05:31,449 The MEM stage initiates a memory read for the green LD instruction. 79 00:05:31,449 --> 00:05:36,330 Note that the read data will first become available in the WB stage - it's not available 80 00:05:36,330 --> 00:05:40,310 to CPU in the current clock cycle. 81 00:05:40,310 --> 00:05:45,210 In cycle 5, the results of the main memory read initiated in cycle 4 are available for 82 00:05:45,210 --> 00:05:49,180 writing to the register file in the WB stage. 83 00:05:49,180 --> 00:05:54,150 So execution of the green LD instruction will be complete when the memory data is written 84 00:05:54,150 --> 00:05:56,960 to R2 at the end of cycle 5. 85 00:05:56,960 --> 00:06:02,569 Meanwhile, the MEM stage is initiating a memory read for the blue LD instruction. 86 00:06:02,569 --> 00:06:07,840 The pipeline continues to complete successive instructions in successive clock cycles. 87 00:06:07,840 --> 00:06:11,550 The latency for a particular instruction is 5 clock cycles. 88 00:06:11,550 --> 00:06:16,070 The throughput of the pipelined CPU is 1 instruction/cycle. 89 00:06:16,070 --> 00:06:21,600 This is the same as the unpipelined implementation, except that the clock period is shorter because 90 00:06:21,600 --> 00:06:25,710 each pipeline stage has fewer components. 91 00:06:25,710 --> 00:06:31,210 Note that the effects of the green LD, i.e., filling R2 with a new value, don't happen 92 00:06:31,210 --> 00:06:35,280 until the rising edge of the clock at the end of cycle 5. 93 00:06:35,280 --> 00:06:40,330 In other words, the results of the green LD aren't available to other instructions until 94 00:06:40,330 --> 00:06:42,580 cycle 6. 95 00:06:42,580 --> 00:06:47,520 If there were instructions in the pipeline that read R2 before cycle 6, they would have 96 00:06:47,520 --> 00:06:50,270 gotten an old value! 97 00:06:50,270 --> 00:06:51,820 This is an example of a data hazard. 98 00:06:51,820 --> 00:06:58,009 Not a problem for us, since our instruction sequence didn't trigger this data hazard. 99 00:06:58,009 --> 00:07:00,580 Tackling data hazards is our next task.