1 00:00:00,500 --> 00:00:03,770 Today we’re going to describe the datapath and control logic 2 00:00:03,770 --> 00:00:06,380 needed to execute Beta instructions. 3 00:00:06,380 --> 00:00:09,050 In an upcoming lab assignment, we’ll ask you to build 4 00:00:09,050 --> 00:00:12,560 a working implementation using our standard cell library. 5 00:00:12,560 --> 00:00:15,940 When you’re done, you’ll have designed and debugged a 32-bit 6 00:00:15,940 --> 00:00:17,940 reduced-instruction set computer! 7 00:00:17,940 --> 00:00:20,260 Not bad… 8 00:00:20,260 --> 00:00:23,550 Before tackling a design task, it’s useful to understand 9 00:00:23,550 --> 00:00:25,420 the goals for the design. 10 00:00:25,420 --> 00:00:27,580 Functionality, of course; in our case 11 00:00:27,580 --> 00:00:30,780 the correct execution of instructions from the Beta ISA. 12 00:00:30,780 --> 00:00:35,550 But there are other goals we should think about. 13 00:00:35,550 --> 00:00:38,280 An obvious goal is to maximize performance, 14 00:00:38,280 --> 00:00:40,360 as measured by the number of instructions 15 00:00:40,360 --> 00:00:42,540 executed per second. 16 00:00:42,540 --> 00:00:44,640 This is usually expressed in MIPS, 17 00:00:44,640 --> 00:00:48,140 an acronym for “Millions of Instructions Per Second”. 18 00:00:48,140 --> 00:00:51,530 When the Intel 8080 was introduced in 1974, 19 00:00:51,530 --> 00:00:57,330 it executed instructions at 0.29 MIPS or 290,000 instructions 20 00:00:57,330 --> 00:01:01,520 per second as measured by the Dhrystone benchmark. 21 00:01:01,520 --> 00:01:04,150 Modern multi-core processors are rated between 10,000 22 00:01:04,150 --> 00:01:08,290 and 100,000 MIPS. 23 00:01:08,290 --> 00:01:10,880 Another goal might be to minimize the manufacturing 24 00:01:10,880 --> 00:01:13,840 cost, which in integrated circuit manufacturing 25 00:01:13,840 --> 00:01:17,820 is proportional to the size of the circuit. 26 00:01:17,820 --> 00:01:22,400 Or we might want have the best performance for a given price. 27 00:01:22,400 --> 00:01:24,060 In our increasingly mobile world, 28 00:01:24,060 --> 00:01:28,570 the best performance per watt might be an important goal. 29 00:01:28,570 --> 00:01:30,160 One of the interesting challenges 30 00:01:30,160 --> 00:01:31,860 in computer engineering is deciding 31 00:01:31,860 --> 00:01:34,660 exactly how to balance performance against cost 32 00:01:34,660 --> 00:01:36,110 and power efficiency. 33 00:01:36,110 --> 00:01:37,940 Clearly the designers of the Apple Watch 34 00:01:37,940 --> 00:01:39,890 have a different set of design goals 35 00:01:39,890 --> 00:01:44,040 then the designers of high-end desktop computers. 36 00:01:44,040 --> 00:01:46,320 The performance of a processor is inversely 37 00:01:46,320 --> 00:01:50,080 proportional to the length of time it takes to run a program. 38 00:01:50,080 --> 00:01:53,970 The shorter the execution time, the higher the performance. 39 00:01:53,970 --> 00:01:56,960 The execution time is determined by three factors. 40 00:01:56,960 --> 00:02:00,430 First, the number of instructions in the program. 41 00:02:00,430 --> 00:02:03,570 Second, the number of clock cycles our sequential circuit 42 00:02:03,570 --> 00:02:06,950 requires to execute a particular instruction. 43 00:02:06,950 --> 00:02:09,820 Complex instructions, e.g., adding two values 44 00:02:09,820 --> 00:02:12,910 from main memory, may make a program shorter, 45 00:02:12,910 --> 00:02:15,370 but may also require many clock cycles 46 00:02:15,370 --> 00:02:19,720 to perform the necessary memory and datapath operations. 47 00:02:19,720 --> 00:02:23,000 Third, the amount of time needed for each clock cycle, 48 00:02:23,000 --> 00:02:26,180 as determined by the propagation delay of the digital logic 49 00:02:26,180 --> 00:02:28,340 in the datapath. 50 00:02:28,340 --> 00:02:30,370 So to increase the performance we 51 00:02:30,370 --> 00:02:33,620 could reduce the number of instructions to be executed. 52 00:02:33,620 --> 00:02:36,900 Or we can try to minimize the number of clock cycles needed 53 00:02:36,900 --> 00:02:40,210 on the average to execute our instructions. 54 00:02:40,210 --> 00:02:42,800 There’s obviously a bit of a tradeoff between these first 55 00:02:42,800 --> 00:02:44,060 two options: 56 00:02:44,060 --> 00:02:46,500 more computation per instruction usually 57 00:02:46,500 --> 00:02:50,060 means it will take more time to execute the instruction. 58 00:02:50,060 --> 00:02:53,020 Or we can try to keep our logic simple, minimizing 59 00:02:53,020 --> 00:02:56,310 its propagation delay in the hopes of having a short clock 60 00:02:56,310 --> 00:02:58,350 period. 61 00:02:58,350 --> 00:03:02,210 Today we’ll focus on an implementation for the Beta ISA 62 00:03:02,210 --> 00:03:06,230 that executes one instruction every clock cycle. 63 00:03:06,230 --> 00:03:09,000 The combinational paths in our circuit will be fairly long, 64 00:03:09,000 --> 00:03:11,450 but, as we learned in Part 1 of the course, 65 00:03:11,450 --> 00:03:15,260 this gives us the opportunity to use pipelining to increase 66 00:03:15,260 --> 00:03:17,620 our implementation’s throughput. 67 00:03:17,620 --> 00:03:19,950 We’ll talk about the implementation of a pipelined 68 00:03:19,950 --> 00:03:23,040 processor in some upcoming lectures. 69 00:03:23,040 --> 00:03:26,090 Here’s a quick refresher on the Beta ISA. 70 00:03:26,090 --> 00:03:28,600 The Beta has thirty-two 32-bit registers 71 00:03:28,600 --> 00:03:31,400 that hold values for use by the datapath. 72 00:03:31,400 --> 00:03:33,340 The first class of ALU instructions, 73 00:03:33,340 --> 00:03:37,560 which have 0b10 as the top 2 bits of the opcode field, 74 00:03:37,560 --> 00:03:41,610 perform an operation on two register operands (Ra and Rb), 75 00:03:41,610 --> 00:03:44,460 storing the result back into a specified destination 76 00:03:44,460 --> 00:03:46,690 register (Rc). 77 00:03:46,690 --> 00:03:50,820 There’s a 6-bit opcode field to specify the operation and three 78 00:03:50,820 --> 00:03:54,530 5-bit register fields to specify the registers to use as source 79 00:03:54,530 --> 00:03:56,490 and destination. 80 00:03:56,490 --> 00:03:58,540 The second class of ALU instructions, 81 00:03:58,540 --> 00:04:01,660 which have 0b11 in the top 2 bits of the opcode, 82 00:04:01,660 --> 00:04:04,920 perform the same set of operations where the second 83 00:04:04,920 --> 00:04:09,110 operand is a constant in the range -32768 to +32767. 84 00:04:09,110 --> 00:04:14,940 The operations include arithmetic operations, 85 00:04:14,940 --> 00:04:19,260 comparisons, boolean operations, and shifts. 86 00:04:19,260 --> 00:04:22,010 In assembly language, we use a “C” suffix added 87 00:04:22,010 --> 00:04:24,720 to the mnemonics shown here to indicate that the second 88 00:04:24,720 --> 00:04:27,720 operand is a constant. 89 00:04:27,720 --> 00:04:29,910 This second instruction format is also 90 00:04:29,910 --> 00:04:33,090 used by the instructions that access memory and change 91 00:04:33,090 --> 00:04:35,990 the normally sequential execution order. 92 00:04:35,990 --> 00:04:38,010 The use of just two instruction formats 93 00:04:38,010 --> 00:04:40,100 will make it very easy to build the logic 94 00:04:40,100 --> 00:04:43,330 responsible for translating the encoded instructions 95 00:04:43,330 --> 00:04:45,530 into the signals needed to control 96 00:04:45,530 --> 00:04:47,800 the operation of the datapath. 97 00:04:47,800 --> 00:04:51,180 In fact, we’ll be able to use many of the instruction bits 98 00:04:51,180 --> 00:04:52,820 as-is! 99 00:04:52,820 --> 00:04:54,830 We’ll build our datapath incrementally, 100 00:04:54,830 --> 00:04:57,460 starting with the logic needed to perform the ALU 101 00:04:57,460 --> 00:05:00,610 instructions, then add additional logic to execute 102 00:05:00,610 --> 00:05:02,990 the memory and branch instructions. 103 00:05:02,990 --> 00:05:06,400 Finally, we’ll need to add logic to handle what happens when 104 00:05:06,400 --> 00:05:09,740 an exception occurs and execution has to be suspended 105 00:05:09,740 --> 00:05:14,310 because the current instruction cannot be executed correctly. 106 00:05:14,310 --> 00:05:17,220 We’ll be using the digital logic gates we learned about in Part 107 00:05:17,220 --> 00:05:18,640 1 of the course. 108 00:05:18,640 --> 00:05:21,300 In particular, we’ll need multi-bit registers to hold 109 00:05:21,300 --> 00:05:25,150 state information from one instruction to the next. 110 00:05:25,150 --> 00:05:26,880 Recall that these memory elements 111 00:05:26,880 --> 00:05:29,590 load new values at the rising edge of the clock signal, 112 00:05:29,590 --> 00:05:33,840 then store that value until the next rising clock edge. 113 00:05:33,840 --> 00:05:37,100 We’ll use a lot of multiplexers in our design to select between 114 00:05:37,100 --> 00:05:40,720 alternative values in the datapath. 115 00:05:40,720 --> 00:05:42,810 The actual computations will be performed 116 00:05:42,810 --> 00:05:44,570 by the arithmetic and logic unit (ALU) 117 00:05:44,570 --> 00:05:47,530 that we designed at the end of Part 1. 118 00:05:47,530 --> 00:05:51,250 It has logic to perform the arithmetic, comparison, boolean 119 00:05:51,250 --> 00:05:54,640 and shift operations listed on the previous slide. 120 00:05:54,640 --> 00:06:00,240 It takes in two 32-bit operands and produces a 32-bit result. 121 00:06:00,240 --> 00:06:03,080 And, finally, we’ll use several different memory components 122 00:06:03,080 --> 00:06:07,040 to implement register storage in the datapath and also for main 123 00:06:07,040 --> 00:06:10,570 memory, where instructions and data are stored. 124 00:06:10,570 --> 00:06:12,790 You might find it useful to review the chapters 125 00:06:12,790 --> 00:06:16,760 on combinational and sequential logic in Part 1 of the course. 126 00:06:16,760 --> 00:06:20,460 The Beta ISA specifies thirty-two 32-bit registers 127 00:06:20,460 --> 00:06:22,420 as part of the datapath. 128 00:06:22,420 --> 00:06:25,280 These are shown as the magenta rectangles in the diagram 129 00:06:25,280 --> 00:06:26,810 below. 130 00:06:26,810 --> 00:06:29,520 These are implemented as load-enabled registers, which 131 00:06:29,520 --> 00:06:32,920 have an EN signal that controls when the register is 132 00:06:32,920 --> 00:06:35,030 loaded with a new value. 133 00:06:35,030 --> 00:06:38,440 If EN is 1, the register will be loaded from the D input 134 00:06:38,440 --> 00:06:41,020 at the next rising clock edge. 135 00:06:41,020 --> 00:06:45,010 If EN is 0, the register is reloaded with its current value 136 00:06:45,010 --> 00:06:46,880 and hence its value is unchanged. 137 00:06:46,880 --> 00:06:50,870 It might seem easier to add enabling logic to the clock 138 00:06:50,870 --> 00:06:53,890 signal, but this is almost never a good idea 139 00:06:53,890 --> 00:06:57,320 since any glitches in that logic might generate false edges that 140 00:06:57,320 --> 00:07:01,400 would cause the register to load a new value at the wrong time. 141 00:07:01,400 --> 00:07:05,400 Always remember the mantra: NO GATED CLOCKS! 142 00:07:05,400 --> 00:07:07,960 There are multiplexers (shown underneath the registers) 143 00:07:07,960 --> 00:07:12,140 that let us select a value from any of the 32 registers. 144 00:07:12,140 --> 00:07:14,930 Since we need two operands for the datapath logic, 145 00:07:14,930 --> 00:07:17,520 there are two such multiplexers. 146 00:07:17,520 --> 00:07:20,460 Their select inputs (RA1 and RA2) 147 00:07:20,460 --> 00:07:23,280 function as addresses, determining 148 00:07:23,280 --> 00:07:27,170 which register values will be selected as operands. 149 00:07:27,170 --> 00:07:30,990 And, finally, there’s a decoder that determines which of the 32 150 00:07:30,990 --> 00:07:37,200 register load enables will be 1 based on the 5-bit WA input. 151 00:07:37,200 --> 00:07:39,920 For convenience, we’ll package all this functionality up 152 00:07:39,920 --> 00:07:43,430 into a single component called a “register file”. 153 00:07:43,430 --> 00:07:45,730 The register file has two read ports, 154 00:07:45,730 --> 00:07:48,190 which, given a 5-bit address input, 155 00:07:48,190 --> 00:07:53,490 deliver the selected register value on the read-data ports. 156 00:07:53,490 --> 00:07:56,417 The two read ports operate independently. 157 00:07:56,417 --> 00:07:58,000 They can read from different registers 158 00:07:58,000 --> 00:08:00,040 or, if the addresses are the same, 159 00:08:00,040 --> 00:08:03,340 read from the same register. 160 00:08:03,340 --> 00:08:05,260 The signals on the left of the register file 161 00:08:05,260 --> 00:08:08,300 include a 5-bit value (WA) that selects a register 162 00:08:08,300 --> 00:08:12,990 to be written with the specified 32-bit write data (WD). 163 00:08:12,990 --> 00:08:15,190 If the write enable signal (WE) is 164 00:08:15,190 --> 00:08:17,360 1 at the rising edge of the clock (CLK) signal, 165 00:08:17,360 --> 00:08:20,400 the selected register will be loaded with the supplied write 166 00:08:20,400 --> 00:08:22,140 data. 167 00:08:22,140 --> 00:08:25,640 Note that in the BETA ISA, reading from register address 168 00:08:25,640 --> 00:08:28,360 31 should always produce a zero value. 169 00:08:28,360 --> 00:08:32,390 The register file has internal logic to ensure that happens. 170 00:08:32,390 --> 00:08:35,049 Here’s a timing diagram that shows the register file 171 00:08:35,049 --> 00:08:36,760 in operation. 172 00:08:36,760 --> 00:08:38,230 To read a value from the register 173 00:08:38,230 --> 00:08:41,360 file, supply a stable address input (RA) 174 00:08:41,360 --> 00:08:43,429 on one of read ports. 175 00:08:43,429 --> 00:08:46,050 After the register file’s propagation delay, 176 00:08:46,050 --> 00:08:48,590 the value of the selected register will appear 177 00:08:48,590 --> 00:08:50,760 on the corresponding read data port (RD). 178 00:08:50,760 --> 00:08:51,910 [CLICK] 179 00:08:51,910 --> 00:08:53,790 Not surprisingly, the register file 180 00:08:53,790 --> 00:08:56,000 write operation is very similar to writing 181 00:08:56,000 --> 00:08:57,980 an ordinary D-register. 182 00:08:57,980 --> 00:09:01,740 The write address (WA), write data (WD) 183 00:09:01,740 --> 00:09:05,750 and write enable (WE) signals must all be valid and stable 184 00:09:05,750 --> 00:09:08,530 for some specified setup time before the rising 185 00:09:08,530 --> 00:09:10,310 edge of the clock. 186 00:09:10,310 --> 00:09:14,380 And must remain stable and valid for the specified hold time 187 00:09:14,380 --> 00:09:17,410 after the rising clock edge. 188 00:09:17,410 --> 00:09:19,130 If those timing constraints are met, 189 00:09:19,130 --> 00:09:21,990 the register file will reliably update the value 190 00:09:21,990 --> 00:09:24,080 of the selected register. 191 00:09:24,080 --> 00:09:25,120 [CLICK] 192 00:09:25,120 --> 00:09:28,710 When a register value is written at the rising clock edge, 193 00:09:28,710 --> 00:09:31,200 if that value is selected by a read address, 194 00:09:31,200 --> 00:09:34,040 the new data will appear after the propagation delay 195 00:09:34,040 --> 00:09:36,540 on the corresponding data port. 196 00:09:36,540 --> 00:09:38,480 In other words, the read data value 197 00:09:38,480 --> 00:09:42,320 changes if either the read address changes or the value 198 00:09:42,320 --> 00:09:46,170 of the selected register changes. 199 00:09:46,170 --> 00:09:48,250 Can we read and write the same register 200 00:09:48,250 --> 00:09:50,480 in a single clock cycle? 201 00:09:50,480 --> 00:09:51,240 Yes! 202 00:09:51,240 --> 00:09:53,820 If the read address becomes valid at the beginning 203 00:09:53,820 --> 00:09:56,490 of the cycle, the old value of the register 204 00:09:56,490 --> 00:10:00,110 will be appear on the data port for the rest of the cycle. 205 00:10:00,110 --> 00:10:04,090 Then, the write occurs at the *end* of the cycle and the new 206 00:10:04,090 --> 00:10:08,120 register value will be available in the next clock cycle. 207 00:10:08,120 --> 00:10:10,740 Okay, that’s a brief run-though of the components we’ll be 208 00:10:10,740 --> 00:10:11,470 using. 209 00:10:11,470 --> 00:10:13,970 Let’s get started on the design!