1 00:00:00,500 --> 00:00:02,890 For this problem, assume that you 2 00:00:02,890 --> 00:00:05,310 have discovered a room full of discarded 3 00:00:05,310 --> 00:00:07,460 5-stage pipelined betas. 4 00:00:07,460 --> 00:00:10,320 These betas fall into four categories. 5 00:00:10,320 --> 00:00:13,460 The first is completely functional 5-stage Betas 6 00:00:13,460 --> 00:00:16,490 with full bypass and annulment logic. 7 00:00:16,490 --> 00:00:19,960 The second are betas with a bad register file. 8 00:00:19,960 --> 00:00:23,560 In these betas, all data read directly from the register file 9 00:00:23,560 --> 00:00:24,960 is zero. 10 00:00:24,960 --> 00:00:27,840 Note that if the data is read from a bypass path, 11 00:00:27,840 --> 00:00:30,730 then the correct value will be read. 12 00:00:30,730 --> 00:00:34,400 The third set are betas without bypass paths. 13 00:00:34,400 --> 00:00:37,260 And finally, the fourth are betas without annulment 14 00:00:37,260 --> 00:00:39,330 of branch delay slots. 15 00:00:39,330 --> 00:00:41,710 The problem is that the betas are not labeled, 16 00:00:41,710 --> 00:00:46,450 so we do not know which falls into which category. 17 00:00:46,450 --> 00:00:49,140 You come up with the test program shown here. 18 00:00:49,140 --> 00:00:51,760 Your plan is to single step through the program 19 00:00:51,760 --> 00:00:54,570 using each Beta chip, carefully noting 20 00:00:54,570 --> 00:00:58,340 the address that the final JMP loads into the PC. 21 00:00:58,340 --> 00:01:01,090 Your goal is to determine which of the four classes 22 00:01:01,090 --> 00:01:05,080 each chip falls into via this JMP address. 23 00:01:05,080 --> 00:01:07,610 Notice that on a fully functional beta, 24 00:01:07,610 --> 00:01:11,060 this code would execute the instructions sequentially 25 00:01:11,060 --> 00:01:12,670 skipping the MULC instruction. 26 00:01:15,560 --> 00:01:17,880 Here we see a pipeline diagram showing 27 00:01:17,880 --> 00:01:21,040 execution of this program on a fully functional beta 28 00:01:21,040 --> 00:01:23,610 from category C1. 29 00:01:23,610 --> 00:01:26,620 It shows that the although the MULC instruction is fetched 30 00:01:26,620 --> 00:01:31,730 in cycle 2, it gets annulled when the BEQ is in the RF stage 31 00:01:31,730 --> 00:01:35,280 and it determines that the branch to label X, or the SUBC 32 00:01:35,280 --> 00:01:37,870 instruction, will be taken. 33 00:01:37,870 --> 00:01:43,280 The MULC is annulled by inserting a NOP in its place. 34 00:01:43,280 --> 00:01:45,500 The ADDC and BEQ instructions read R31 35 00:01:45,500 --> 00:01:48,440 from the register file. 36 00:01:48,440 --> 00:01:50,840 The SUBC, however, gets the value 37 00:01:50,840 --> 00:01:54,550 of R2 via the bypass path from the BEQ instruction 38 00:01:54,550 --> 00:01:57,120 which is in the MEM stage. 39 00:01:57,120 --> 00:02:00,840 The ADD then reads R0 and R2. 40 00:02:00,840 --> 00:02:03,390 R0 has already made it back to the register file 41 00:02:03,390 --> 00:02:05,690 because the ADDC instruction completed 42 00:02:05,690 --> 00:02:07,980 by the end of cycle 4. 43 00:02:07,980 --> 00:02:10,910 R2, however, is read via the bypass path 44 00:02:10,910 --> 00:02:15,290 from the SUBC instruction which is in the ALU stage. 45 00:02:15,290 --> 00:02:17,460 Finally, the JMP, reads the value 46 00:02:17,460 --> 00:02:21,400 of R3 via the bypass path from the ADD instruction 47 00:02:21,400 --> 00:02:25,540 which is in the ALU stage in cycle 6. 48 00:02:25,540 --> 00:02:29,040 When run on a fully functional beta with bypass paths 49 00:02:29,040 --> 00:02:31,200 and annulment of branch delay slots, 50 00:02:31,200 --> 00:02:34,380 the code behaves as follows: 51 00:02:34,380 --> 00:02:37,530 The ADDC sets R0 = 4. 52 00:02:37,530 --> 00:02:40,610 The BEQ stores PC + 4 into R2. 53 00:02:40,610 --> 00:02:46,220 Since the ADDC is at address 0, the BEQ is at address 4, 54 00:02:46,220 --> 00:02:51,520 so PC + 4 = 8 is stored into R2, and the program branches 55 00:02:51,520 --> 00:02:53,430 to label X. 56 00:02:53,430 --> 00:02:57,390 Next, the SUBC subtracts 4 from the latest value of R2 57 00:02:57,390 --> 00:03:00,940 and stores the result which is 4 back into R2. 58 00:03:00,940 --> 00:03:08,240 The ADD adds R0 and R2, or 4 and 4, and stores the result 59 00:03:08,240 --> 00:03:09,560 which is 8 into R3. 60 00:03:09,560 --> 00:03:17,230 The JMP jumps to the address in R3 which is 8. 61 00:03:17,230 --> 00:03:20,390 When run on C2 which has a bad register file that 62 00:03:20,390 --> 00:03:23,740 always outputs a zero, the behavior of the program 63 00:03:23,740 --> 00:03:26,430 changes a bit. 64 00:03:26,430 --> 00:03:27,940 The ADDC and BEQ instructions which 65 00:03:27,940 --> 00:03:32,040 use R31 which is 0 anyways behave in the same way 66 00:03:32,040 --> 00:03:33,810 as before. 67 00:03:33,810 --> 00:03:37,990 The SUBC, which gets the value of R2 from the bypass path, 68 00:03:37,990 --> 00:03:41,790 reads the correct value for R2 which is 8. 69 00:03:41,790 --> 00:03:45,130 Recall that only reads directly from the register file return 70 00:03:45,130 --> 00:03:49,160 a zero, whereas if the data is coming from a bypass path, then 71 00:03:49,160 --> 00:03:50,960 you get the correct value. 72 00:03:50,960 --> 00:03:52,700 So the SUBC produces a 4. 73 00:03:52,700 --> 00:03:56,940 The ADD reads R0 from the register file 74 00:03:56,940 --> 00:03:59,600 and R2 from the bypass path. 75 00:03:59,600 --> 00:04:01,810 The result is that the value of R0 76 00:04:01,810 --> 00:04:07,690 is read as if it was 0 while that of R2 is correct and is 4. 77 00:04:07,690 --> 00:04:10,930 So R3 gets the value 4 assigned to it, 78 00:04:10,930 --> 00:04:13,830 and that is the address that the JMP instruction jumps 79 00:04:13,830 --> 00:04:20,190 to because it too reads its register from a bypass path. 80 00:04:20,190 --> 00:04:24,300 When run on C3 which does not have any bypass paths, 81 00:04:24,300 --> 00:04:27,100 some of the instructions will read stale values 82 00:04:27,100 --> 00:04:29,030 of their source operands. 83 00:04:29,030 --> 00:04:31,670 Let’s go through the example in detail. 84 00:04:31,670 --> 00:04:35,400 The ADDC and BEQ instructions read 85 00:04:35,400 --> 00:04:39,810 R31 which is 0 from the register file so what they ultimately 86 00:04:39,810 --> 00:04:41,820 produce does not change. 87 00:04:41,820 --> 00:04:44,760 However, you must keep in mind that the updated 88 00:04:44,760 --> 00:04:47,000 value of the destination register 89 00:04:47,000 --> 00:04:51,100 will not get updated until after that instruction completes 90 00:04:51,100 --> 00:04:54,430 the WB stage of the pipeline. 91 00:04:54,430 --> 00:04:58,470 When the SUBC reads R2, it gets a stale value of R2 92 00:04:58,470 --> 00:05:02,030 because the BEQ instruction has not yet completed, 93 00:05:02,030 --> 00:05:05,080 so it assumes that R2 = 0. 94 00:05:05,080 --> 00:05:10,830 It then subtracts 4 from that and tries to write -4 into R2. 95 00:05:10,830 --> 00:05:12,980 Next the ADD runs. 96 00:05:12,980 --> 00:05:14,920 Recall that the ADD would normally 97 00:05:14,920 --> 00:05:18,200 read R0 from the register file because by the time 98 00:05:18,200 --> 00:05:21,990 the ADD is in the RF stage, the ADDC which writes to R0 99 00:05:21,990 --> 00:05:25,160 has completed all the pipeline stages. 100 00:05:25,160 --> 00:05:29,510 The ADD also normally read R2 from the bypass path. 101 00:05:29,510 --> 00:05:33,200 However, since C3 does not have bypass paths, 102 00:05:33,200 --> 00:05:36,920 it reads a stale value of R2 from the register file. 103 00:05:36,920 --> 00:05:39,440 To determine which stale value it reads, 104 00:05:39,440 --> 00:05:42,210 we need to examine the pipeline diagram 105 00:05:42,210 --> 00:05:46,880 to see if either the BEQ or the SUBC operations, both of which 106 00:05:46,880 --> 00:05:49,590 eventually update R2, have completed 107 00:05:49,590 --> 00:05:52,090 by the time the ADD reads its source operands. 108 00:05:55,130 --> 00:05:57,500 Looking at our pipeline diagram, we 109 00:05:57,500 --> 00:06:00,210 see that when the ADD is in the RF stage, 110 00:06:00,210 --> 00:06:03,770 neither the BEQ nor the SUBC have completed, 111 00:06:03,770 --> 00:06:08,210 thus the value read for R2 is the initial value of R2 which 112 00:06:08,210 --> 00:06:08,710 is 0. 113 00:06:08,710 --> 00:06:13,670 We can now determine the behavior of the ADD instruction 114 00:06:13,670 --> 00:06:17,510 which is that it assumes R0 = 4 and R2 = 0 115 00:06:17,510 --> 00:06:22,220 and will eventually write a 4 into R3. 116 00:06:22,220 --> 00:06:24,310 Finally, the JMP instruction wants 117 00:06:24,310 --> 00:06:26,780 to read the result of the ADD, however, 118 00:06:26,780 --> 00:06:28,820 since there are no bypass paths, it 119 00:06:28,820 --> 00:06:31,940 reads the original value of R3 which was 0 120 00:06:31,940 --> 00:06:34,440 and jumps to address 0. 121 00:06:34,440 --> 00:06:37,530 Of course, had we looked at our code a little more closely, 122 00:06:37,530 --> 00:06:39,450 we could have determined this without all 123 00:06:39,450 --> 00:06:43,180 the intermediate steps because the ADD is the only instruction 124 00:06:43,180 --> 00:06:45,850 that tries to update R3, and you know 125 00:06:45,850 --> 00:06:49,400 that the JMP would normally get R3 from the bypass path which 126 00:06:49,400 --> 00:06:53,200 is not available, therefore it must read the original value 127 00:06:53,200 --> 00:06:57,520 of R3 which is 0. 128 00:06:57,520 --> 00:07:01,550 In category C4, the betas do not annul instructions that were 129 00:07:01,550 --> 00:07:04,430 fetched after a branch instruction but aren’t supposed 130 00:07:04,430 --> 00:07:05,990 to get executed. 131 00:07:05,990 --> 00:07:07,910 This means that the MULC which is 132 00:07:07,910 --> 00:07:10,640 fetched after the BEQ is actually 133 00:07:10,640 --> 00:07:14,850 executed in the pipeline and affects the value of R2. 134 00:07:14,850 --> 00:07:19,230 Let’s take a close look at what happens in each instruction. 135 00:07:19,230 --> 00:07:22,570 Once again we begin with the ADDC setting R0 to 4 136 00:07:22,570 --> 00:07:25,870 and the BEQ setting R2 to 8. 137 00:07:25,870 --> 00:07:28,130 Since our bypass paths are now working, 138 00:07:28,130 --> 00:07:29,990 we can assume that we can immediately 139 00:07:29,990 --> 00:07:32,260 get the updated value. 140 00:07:32,260 --> 00:07:36,170 Next, we execute the MULC which takes the latest value of R2 141 00:07:36,170 --> 00:07:40,070 from the bypass path and multiplies it by 2. 142 00:07:40,070 --> 00:07:42,300 So it sets R2 = 16. 143 00:07:42,300 --> 00:07:48,440 The SUBC now uses this value for R2 from which it subtracts 4 144 00:07:48,440 --> 00:07:52,060 to produce R2 = 12. 145 00:07:52,060 --> 00:07:56,740 The ADD then reads R0 = 4 and adds to it R2 = 12 146 00:07:56,740 --> 00:08:02,300 to produce R3 = 16. 147 00:08:02,300 --> 00:08:04,750 Finally, the JMP jumps to address 16. 148 00:08:04,750 --> 00:08:08,480 Since each of the four categories 149 00:08:08,480 --> 00:08:11,790 produces a unique jump address, this program 150 00:08:11,790 --> 00:08:16,630 can be used to sort out all the betas into the four categories.