1 00:00:00,500 --> 00:00:03,210 Let’s summarize what we’ve learned about controlling 2 00:00:03,210 --> 00:00:05,115 pipelined systems. 3 00:00:05,115 --> 00:00:06,490 The most straightforward approach 4 00:00:06,490 --> 00:00:08,320 is to use a pipeline with the system 5 00:00:08,320 --> 00:00:12,970 clock chosen to accommodate the worst-case processing time. 6 00:00:12,970 --> 00:00:16,300 These systems are easy to design but can’t produce higher 7 00:00:16,300 --> 00:00:19,590 throughputs if the processing stages might run more quickly 8 00:00:19,590 --> 00:00:22,190 for some data values. 9 00:00:22,190 --> 00:00:24,320 We saw that we could use a simple handshake 10 00:00:24,320 --> 00:00:27,010 protocol to move data through the system. 11 00:00:27,010 --> 00:00:29,190 All communication still happens on the rising edge 12 00:00:29,190 --> 00:00:31,870 of the system clock, but the specific clock edge 13 00:00:31,870 --> 00:00:33,720 used to transfer data is determined 14 00:00:33,720 --> 00:00:36,690 by the stages themselves. 15 00:00:36,690 --> 00:00:39,150 It’s tempting to wonder if we can might adjust the global 16 00:00:39,150 --> 00:00:42,100 clock period to take advantage of data-dependent processing 17 00:00:42,100 --> 00:00:43,650 speedups. 18 00:00:43,650 --> 00:00:45,480 But the necessary timing generators 19 00:00:45,480 --> 00:00:48,460 can be very complicated in large systems. 20 00:00:48,460 --> 00:00:51,620 It’s usually much easier to use local communication between 21 00:00:51,620 --> 00:00:54,820 modules to determine system timing than trying to figure 22 00:00:54,820 --> 00:00:57,370 out all the constraints at the system level. 23 00:00:57,370 --> 00:01:00,800 So this approach isn’t usually a good one. 24 00:01:00,800 --> 00:01:03,680 But what about locally-timed asynchronous systems 25 00:01:03,680 --> 00:01:06,330 like the example we just saw? 26 00:01:06,330 --> 00:01:09,330 Each generation of engineers has heard the siren call 27 00:01:09,330 --> 00:01:11,170 of asynchronous logic. 28 00:01:11,170 --> 00:01:13,660 Sadly, it usually proves too hard to produce 29 00:01:13,660 --> 00:01:16,510 a provably reliable design for a large system, 30 00:01:16,510 --> 00:01:19,190 say, a modern computer. 31 00:01:19,190 --> 00:01:22,150 But there are special cases, such as the logic for integer 32 00:01:22,150 --> 00:01:25,220 division, where the data-dependent speed-ups 33 00:01:25,220 --> 00:01:28,420 make the extra work worthwhile. 34 00:01:28,420 --> 00:01:30,840 We characterized the performance of our systems 35 00:01:30,840 --> 00:01:33,680 by measuring their latency and throughput. 36 00:01:33,680 --> 00:01:35,550 For combinational circuits, the latency 37 00:01:35,550 --> 00:01:37,920 is simply the propagation delay of the circuit 38 00:01:37,920 --> 00:01:41,800 and its throughput is just 1/latency. 39 00:01:41,800 --> 00:01:44,390 We introduced a systematic strategy for designing 40 00:01:44,390 --> 00:01:48,580 K-pipelines, where there’s a register on the outputs of each 41 00:01:48,580 --> 00:01:52,320 stage, and there are exactly K registers on every path from 42 00:01:52,320 --> 00:01:54,840 input to output. 43 00:01:54,840 --> 00:01:57,050 The period of the system clock t_CLK 44 00:01:57,050 --> 00:02:00,850 is determined by the propagation delay of the slowest pipeline 45 00:02:00,850 --> 00:02:02,210 stage. 46 00:02:02,210 --> 00:02:05,840 The throughput of a pipelined system is 1/t_CLK and its 47 00:02:05,840 --> 00:02:09,780 latency is K times t_CLK. 48 00:02:09,780 --> 00:02:11,840 Pipelining is the key to increasing 49 00:02:11,840 --> 00:02:14,270 the throughput of most high-performance digital 50 00:02:14,270 --> 00:02:15,820 systems.