1 00:00:01,030 --> 00:00:05,939 Here's another approach to improving the latency of our adder, this time focusing just on the 2 00:00:05,939 --> 00:00:07,190 carry logic. 3 00:00:07,190 --> 00:00:12,089 Early on in the course, we learned that by going from a chain of logic gates to a tree 4 00:00:12,089 --> 00:00:17,500 of logic gates, we could go from a linear latency to a logarithmic latency. 5 00:00:17,500 --> 00:00:20,090 Let's try to do that here. 6 00:00:20,090 --> 00:00:24,680 We'll start by rewriting the equations for the carry-out from the full adder module. 7 00:00:24,680 --> 00:00:28,360 The final form of the rewritten equation has two terms. 8 00:00:28,360 --> 00:00:33,370 The G, or generate, term is true when the inputs will cause the module to generate a 9 00:00:33,370 --> 00:00:38,410 carry-out right away, without having to wait for the carry-in to arrive. 10 00:00:38,410 --> 00:00:43,070 The P, or propagate, term is true if the module will generate a carry-out only if there's 11 00:00:43,070 --> 00:00:45,569 a carry-in. 12 00:00:45,569 --> 00:00:50,350 So there only two ways to get a carry-out from the module: it's either generated by 13 00:00:50,350 --> 00:00:55,329 the current module or the carry-in is propagated from the previous module. 14 00:00:55,329 --> 00:01:01,909 Actually, it's usual to change the logic for the P term from "A OR B" to "A XOR B". 15 00:01:01,909 --> 00:01:05,950 This doesn't change the truth table for the carry-out but will allow us to express the 16 00:01:05,950 --> 00:01:09,789 sum output as "P XOR carry-in". 17 00:01:09,789 --> 00:01:13,450 Here's the schematic for the reorganized full adder module. 18 00:01:13,450 --> 00:01:18,201 The little sum-of-products circuit for the carry-out can be implemented using 3 2-input 19 00:01:18,201 --> 00:01:23,130 NAND gates, which is a bit more compact than the implementation for the three product terms 20 00:01:23,130 --> 00:01:25,420 we suggested in Lab 2. 21 00:01:25,420 --> 00:01:28,610 Time to update your full adder circuit! 22 00:01:28,610 --> 00:01:33,030 Now consider two adjacent adder modules in a larger adder circuit: 23 00:01:33,030 --> 00:01:39,000 we'll use the label H to refer to the high-order module and the label L to refer to the low-order 24 00:01:39,000 --> 00:01:40,070 module. 25 00:01:40,070 --> 00:01:44,649 We can use the generate and propagate information from each of the modules to develop equations 26 00:01:44,649 --> 00:01:49,009 for the carry-out from the pair of modules treated as a single block. 27 00:01:49,009 --> 00:01:54,950 We'll generate a carry-out from the block when a carry-out is generated by the H module, 28 00:01:54,950 --> 00:02:01,460 or when a carry-out is generated by the L module and propagated by the H module. 29 00:02:01,460 --> 00:02:06,399 And we'll propagate the carry-in through the block only if the L module propagates its 30 00:02:06,399 --> 00:02:12,970 carry-in to the intermediate carry-out and H module propagates that to the final carry-out. 31 00:02:12,970 --> 00:02:19,870 So we have two simple equations requiring only a couple of logic gates to implement. 32 00:02:19,870 --> 00:02:26,450 Let's use these equations to build a generate-propagate (GP) module and hook it to the H and L modules 33 00:02:26,450 --> 00:02:27,980 as shown. 34 00:02:27,980 --> 00:02:32,790 The G and P outputs of the GP module tell us under what conditions we'll get a carry-out 35 00:02:32,790 --> 00:02:38,610 from the two individual modules treated as a single, larger block. 36 00:02:38,610 --> 00:02:43,120 We can use additional layers of GP modules to build a tree of logic that computes the 37 00:02:43,120 --> 00:02:48,099 generate and propagate logic for adders with any number of inputs. 38 00:02:48,099 --> 00:02:53,780 For an adder with N inputs, the tree will contain a total of N-1 GP modules and have 39 00:02:53,780 --> 00:02:56,909 a latency that's order log(N). 40 00:02:56,909 --> 00:03:00,659 In the next step, we'll see how to use the generate and propagate information to quickly 41 00:03:00,659 --> 00:03:06,569 compute the carry-in for each of the original full adder modules. 42 00:03:06,569 --> 00:03:11,379 Once we're given the carry-in C_0 for the low-order bit, we can hierarchically compute 43 00:03:11,379 --> 00:03:14,819 the carry-in for each full adder module. 44 00:03:14,819 --> 00:03:19,410 Given the carry-in to a block of adders, we simply pass it along as the carry-in to the 45 00:03:19,410 --> 00:03:21,200 low-half of the block. 46 00:03:21,200 --> 00:03:25,239 The carry-in for the high-half of the block is computed the using the generate and propagate 47 00:03:25,239 --> 00:03:28,280 information from the low-half of the block. 48 00:03:28,280 --> 00:03:33,900 We can use these equations to build a C module and arrange the C modules in a tree as shown 49 00:03:33,900 --> 00:03:40,140 to use the C_0 carry-in to hierarchically compute the carry-in to each layer of successively 50 00:03:40,140 --> 00:03:44,980 smaller blocks, until we finally reach the full adder modules. 51 00:03:44,980 --> 00:03:51,180 For example, these equations show how C4 is computed from C0, and C6 is computed from 52 00:03:51,180 --> 00:03:53,280 C4. 53 00:03:53,280 --> 00:03:58,269 Again the total propagation delay from the arrival of the C_0 input to the carry-ins 54 00:03:58,269 --> 00:04:02,379 for each full adder is order log(N). 55 00:04:02,379 --> 00:04:09,209 Notice that the G_L and P_L inputs to a particular C module are the same as two of the inputs 56 00:04:09,209 --> 00:04:13,110 to the GP module in the same position in the GP tree. 57 00:04:13,110 --> 00:04:18,668 We can combine the GP module and C module to form a single carry-lookahead module that 58 00:04:18,668 --> 00:04:23,940 passes generate and propagate information up the tree and carry-in information down 59 00:04:23,940 --> 00:04:25,130 the tree. 60 00:04:25,130 --> 00:04:30,430 The schematic at the top shows how to wire up the tree of carry-lookahead modules. 61 00:04:30,430 --> 00:04:33,970 And now we get to the payoff for all this hard work! 62 00:04:33,970 --> 00:04:39,060 The combined propagation delay to hierarchically compute the generate and propagate information 63 00:04:39,060 --> 00:04:45,370 on the way up and the carry-in information on the way down is order log(N), 64 00:04:45,370 --> 00:04:50,690 which is then the latency for the entire adder since computing the sum outputs only takes 65 00:04:50,690 --> 00:04:53,680 one additional XOR delay. 66 00:04:53,680 --> 00:04:59,020 This is a considerable improvement over the order N latency of the ripple-carry adder. 67 00:04:59,020 --> 00:05:04,639 A final design note: we no longer need the carry-out circuitry in the full adder module, 68 00:05:04,639 --> 00:05:07,169 so it can be removed. 69 00:05:07,169 --> 00:05:12,740 Variations on this generate-propagate strategy form the basis for the fastest-known adder 70 00:05:12,740 --> 00:05:13,870 circuits. 71 00:05:13,870 --> 00:05:18,000 If you'd like to learn more, look up "Kogge-Stone adders" on Wikipedia.