1 00:00:01,709 --> 00:00:07,740 For this problem, assume that you have a fully functioning 5-stage pipelined beta with full 2 00:00:07,740 --> 00:00:13,500 bypassing and annulment of branch delay slots as presented in lecture. 3 00:00:13,500 --> 00:00:17,500 This beta has been running the program shown here for a while. 4 00:00:17,500 --> 00:00:22,539 The actual functionality of this program is not so important for this problem, but lets 5 00:00:22,539 --> 00:00:24,949 just review it quickly. 6 00:00:24,949 --> 00:00:30,380 This program begins by initializing R1 to 0 before entering the loop. 7 00:00:30,380 --> 00:00:35,370 R1 represents the index of the array element currently being accessed. 8 00:00:35,370 --> 00:00:40,770 Within the loop, the value of that array element is loaded into R0. 9 00:00:40,770 --> 00:00:47,170 R1 is then incremented by 4 in order to point to the next element in the array. 10 00:00:47,170 --> 00:00:52,489 We then compare the array element that was just loaded into R0 with the updated index 11 00:00:52,489 --> 00:00:57,010 in R1 and if they are equal, then we repeat the loop. 12 00:00:57,010 --> 00:01:02,909 If they are not equal, then we store the current value of R1 into a memory location called 13 00:01:02,909 --> 00:01:09,050 index to remember which index value satisfied the compare instruction. 14 00:01:09,050 --> 00:01:13,070 We want to understand how this program would run on our beta. 15 00:01:13,070 --> 00:01:19,910 In order to do this, we will create a pipeline diagram showing the execution of this program. 16 00:01:19,910 --> 00:01:26,600 A pipeline diagram demonstrates which instruction is currently being executed in each of the 17 00:01:26,600 --> 00:01:29,079 5 pipeline stages. 18 00:01:29,079 --> 00:01:33,590 Our rows indicate the pipeline stage that the instruction is in. 19 00:01:33,590 --> 00:01:36,030 There are five pipeline stages. 20 00:01:36,030 --> 00:01:42,780 The first is IF, or instruction fetch, which fetches the next instruction from memory. 21 00:01:42,780 --> 00:01:51,340 The second is RF, or register file stage which reads the source operands of the instruction. 22 00:01:51,340 --> 00:01:59,190 Next comes the ALU stage where all required arithmetic and logic unit operations are executed. 23 00:01:59,190 --> 00:02:04,070 The fourth stage is the MEM stage where we can begin accessing memory for a load or store 24 00:02:04,070 --> 00:02:10,050 operation because the address of the memory location was computed in the ALU stage. 25 00:02:10,050 --> 00:02:17,230 Finally, the last stage is WB, or the write back stage where the results are written back 26 00:02:17,230 --> 00:02:19,860 into the register file. 27 00:02:19,860 --> 00:02:25,850 The columns in a pipeline diagram represent the execution cycles. 28 00:02:25,850 --> 00:02:32,490 Our loop begins with a LD operation, so we see our LD instruction in the IF stage in 29 00:02:32,490 --> 00:02:34,220 cycle 1001. 30 00:02:34,220 --> 00:02:42,690 The LD operation then proceeds down the 5 stages of the pipelined beta. 31 00:02:42,690 --> 00:02:45,510 Next comes the ADDC instruction. 32 00:02:45,510 --> 00:02:51,140 Since there is no dependency between the LD and the ADDC instruction, the ADDC instruction 33 00:02:51,140 --> 00:03:00,060 begins in cycle 1002 and proceeds through all the 5 stages of the beta pipeline as well. 34 00:03:00,060 --> 00:03:02,980 Next comes the CMPEQ instruction. 35 00:03:02,980 --> 00:03:08,200 When we reach the CMPEQ instruction, we are met with our first data hazard caused by the 36 00:03:08,200 --> 00:03:15,350 fact that the LD is updating R0, and the CMPEQ wants to read this new value of R0. 37 00:03:15,350 --> 00:03:22,120 Recall, that a LD does not produce its value until the WB stage of the pipeline. 38 00:03:22,120 --> 00:03:28,310 This means that even with full bypassing logic, the CMPEQ instruction cannot read register 39 00:03:28,310 --> 00:03:32,690 R0 until the LD is in the WB stage. 40 00:03:32,690 --> 00:03:39,380 So we must initiate a stall of the pipeline in cycle 1004. 41 00:03:39,380 --> 00:03:46,270 The stall can be seen in our pipeline diagram in cycle 1005 where the CMPEQ has remained 42 00:03:46,270 --> 00:03:52,510 in the RF stage and we have inserted a NOP in place of the CMPEQ that was coming down 43 00:03:52,510 --> 00:03:54,970 the pipe one cycle earlier. 44 00:03:54,970 --> 00:03:59,590 The instruction that follows the CMPEQ is the BNE. 45 00:03:59,590 --> 00:04:06,690 Notice that it entered the IF stage in cycle 1004, but it too was stalled by the CMPEQ, 46 00:04:06,690 --> 00:04:15,020 so the BNE remains in the IF stage while the CMPEQ is stuck in the RF stage. 47 00:04:15,020 --> 00:04:21,939 In cycle 1005, the CMPEQ is able to complete the read of its operands by using the bypass 48 00:04:21,939 --> 00:04:28,710 path from the WB stage to read the updated value of R0, and by using the bypass path 49 00:04:28,710 --> 00:04:37,099 from the MEM stage to read the updated value of R1 produced by the ADDC instruction. 50 00:04:37,099 --> 00:04:45,620 In cycle 1006, the CMPEQ instruction moves on to the ALU stage and the BNE can move on 51 00:04:45,620 --> 00:04:47,800 to the RF stage. 52 00:04:47,800 --> 00:04:53,669 Since the CMPEQ is going to update the value of R2 which is the register that the BNE is 53 00:04:53,669 --> 00:05:00,509 trying to read, we need to make use of the bypass path from the ALU stage to the RF stage 54 00:05:00,509 --> 00:05:08,360 in order to provide the BNE with the result of the CMPEQ instruction in cycle 1006. 55 00:05:08,360 --> 00:05:12,610 The RF stage is also the stage when Z is generated. 56 00:05:12,610 --> 00:05:17,819 The Z signal tells the beta whether or not a register is equal to zero. 57 00:05:17,819 --> 00:05:24,270 This means that by the end of the RF stage in cycle 1006, the BNE will know whether it 58 00:05:24,270 --> 00:05:26,789 is repeating the loop or not. 59 00:05:26,789 --> 00:05:32,150 We now illustrate what happens to the pipeline diagram if the loop is repeated. 60 00:05:32,150 --> 00:05:38,860 In cycle 1006, the ST instruction enters the IF stage of the pipeline because until we 61 00:05:38,860 --> 00:05:43,990 resolve whether a branch is taken or not, we assume that we should continue fetching 62 00:05:43,990 --> 00:05:46,199 the next instruction. 63 00:05:46,199 --> 00:05:51,300 If the BNE determines that it should branch back to LOOP, then this ST instruction which 64 00:05:51,300 --> 00:05:57,520 was just fetched must be annulled by inserting a NOP in its place. 65 00:05:57,520 --> 00:06:05,020 The annulment is initiated in cycle 1006 and shows up as a NOP in the RF stage in cycle 66 00:06:05,020 --> 00:06:07,220 1007. 67 00:06:07,220 --> 00:06:13,860 In cycle 1007, we also see that we now fetch the first instruction of the loop which is 68 00:06:13,860 --> 00:06:20,610 the LD instruction so that we can repeat the loop. 69 00:06:20,610 --> 00:06:25,620 Here is a complete pipeline diagram showing repeated execution of the loop in our sample 70 00:06:25,620 --> 00:06:31,260 code together with the bypass paths being used as well as the initiation of stalls and 71 00:06:31,260 --> 00:06:34,069 annulment of branch delay slots. 72 00:06:34,069 --> 00:06:41,840 We are now ready to answer a few questions about the execution of this loop on our beta. 73 00:06:41,840 --> 00:06:49,169 The first question we want to consider is which of the registers R0, R1, and/or R2 were 74 00:06:49,169 --> 00:06:54,370 read at least once directly from the register file rather than through a bypass path? 75 00:06:54,370 --> 00:07:00,759 Looking back at our completed pipeline diagram, we see that the LD and ADDC instructions did 76 00:07:00,759 --> 00:07:04,270 not get their operands through bypass paths. 77 00:07:04,270 --> 00:07:09,909 Since both of those instructions read R1, that means that register R1 was read at least 78 00:07:09,909 --> 00:07:12,849 once directly from the register file. 79 00:07:12,849 --> 00:07:18,900 R0 which is only read by the CMPEQ always comes from a bypass path. 80 00:07:18,900 --> 00:07:25,509 Similarly, R2, which is only read by the BNE, always comes from a bypass path as well. 81 00:07:25,509 --> 00:07:32,759 Next, we want to identify the cycle in which stall was set to 1 in the pipelined beta hardware. 82 00:07:32,759 --> 00:07:38,639 This occurs in the cycle where the stall is initiated which was in cycle 1004. 83 00:07:38,639 --> 00:07:43,520 At the end of that cycle the instructions that are currently in the IF and RF stage 84 00:07:43,520 --> 00:07:49,800 are stalled by not allowing a load of a new value into the instruction registers of that 85 00:07:49,800 --> 00:07:51,400 pipeline stage. 86 00:07:51,400 --> 00:07:58,340 Next, we want to determine in which cycle was ANNUL_IF != 0? 87 00:07:58,340 --> 00:08:05,550 Recall that the ANNUL_STAGE control signals specify when an annulment is initiated in 88 00:08:05,550 --> 00:08:07,819 that particular stage. 89 00:08:07,819 --> 00:08:13,419 In order to initiate an annulment, then the instruction that is currently in the IF stage 90 00:08:13,419 --> 00:08:15,550 is replaced with a NOP. 91 00:08:15,550 --> 00:08:21,229 This occurs in the IF stage when we need to annul a branch delay slot. 92 00:08:21,229 --> 00:08:26,020 In our example this occurs in cycle 1006. 93 00:08:26,020 --> 00:08:30,740 In which cycle was ANNUL_RF != 0? 94 00:08:30,740 --> 00:08:36,590 This question is asking when an annulment was initiated in the RF stage. 95 00:08:36,590 --> 00:08:42,039 This occurred when the CMPEQ instruction needed to be stalled in the RF stage. 96 00:08:42,039 --> 00:08:47,460 In order to fill the pipeline bubbles, a NOP is inserted into the pipeline in place of 97 00:08:47,460 --> 00:08:54,130 the CMPEQ instruction that was in the RF stage in cycle 1004. 98 00:08:54,130 --> 00:09:03,660 The stall and thus the setting of ANNUL_RF != 0 occurs in cycle 1004. 99 00:09:03,660 --> 00:09:07,150 In which cycle was ANNUL_ALU != 0? 100 00:09:07,150 --> 00:09:12,430 In other words, in which cycle did we initiate the replacement of an instruction in the ALU 101 00:09:12,430 --> 00:09:15,180 stage with a NOP? 102 00:09:15,180 --> 00:09:18,440 This does not occur in our example. 103 00:09:18,440 --> 00:09:24,010 Next, we want to consider our bypass paths. 104 00:09:24,010 --> 00:09:30,790 In which cycle was either bypass coming from the ALU stage? 105 00:09:30,790 --> 00:09:41,210 In cycle 1006, the BNE reads the result of the CMPEQ instruction from the ALU stage. 106 00:09:41,210 --> 00:09:46,720 In which cycle was either bypass coming from the MEM stage? 107 00:09:46,720 --> 00:09:54,720 In cycle 1005, the CMPEQ reads the result of the ADDC instruction from the MEM stage. 108 00:09:54,720 --> 00:10:01,570 Finally, in which cycle was either bypass coming from the WB stage? 109 00:10:01,570 --> 00:10:08,750 In cycle 1005, the CMPEQ reads the result of the LD instruction from the WB stage.