1
00:00:01,030 --> 00:00:05,939
Here's another approach to improving the latency
of our adder, this time focusing just on the

2
00:00:05,939 --> 00:00:07,190
carry logic.

3
00:00:07,190 --> 00:00:12,089
Early on in the course, we learned that by
going from a chain of logic gates to a tree

4
00:00:12,089 --> 00:00:17,500
of logic gates, we could go from a linear
latency to a logarithmic latency.

5
00:00:17,500 --> 00:00:20,090
Let's try to do that here.

6
00:00:20,090 --> 00:00:24,680
We'll start by rewriting the equations for
the carry-out from the full adder module.

7
00:00:24,680 --> 00:00:28,360
The final form of the rewritten equation has
two terms.

8
00:00:28,360 --> 00:00:33,370
The G, or generate, term is true when the
inputs will cause the module to generate a

9
00:00:33,370 --> 00:00:38,410
carry-out right away, without having to wait
for the carry-in to arrive.

10
00:00:38,410 --> 00:00:43,070
The P, or propagate, term is true if the module
will generate a carry-out only if there's

11
00:00:43,070 --> 00:00:45,569
a carry-in.

12
00:00:45,569 --> 00:00:50,350
So there only two ways to get a carry-out
from the module: it's either generated by

13
00:00:50,350 --> 00:00:55,329
the current module or the carry-in is propagated
from the previous module.

14
00:00:55,329 --> 00:01:01,909
Actually, it's usual to change the logic for
the P term from "A OR B" to "A XOR B".

15
00:01:01,909 --> 00:01:05,950
This doesn't change the truth table for the
carry-out but will allow us to express the

16
00:01:05,950 --> 00:01:09,789
sum output as "P XOR carry-in".

17
00:01:09,789 --> 00:01:13,450
Here's the schematic for the reorganized full
adder module.

18
00:01:13,450 --> 00:01:18,201
The little sum-of-products circuit for the
carry-out can be implemented using 3 2-input

19
00:01:18,201 --> 00:01:23,130
NAND gates, which is a bit more compact than
the implementation for the three product terms

20
00:01:23,130 --> 00:01:25,420
we suggested in Lab 2.

21
00:01:25,420 --> 00:01:28,610
Time to update your full adder circuit!

22
00:01:28,610 --> 00:01:33,030
Now consider two adjacent adder modules in
a larger adder circuit:

23
00:01:33,030 --> 00:01:39,000
we'll use the label H to refer to the high-order
module and the label L to refer to the low-order

24
00:01:39,000 --> 00:01:40,070
module.

25
00:01:40,070 --> 00:01:44,649
We can use the generate and propagate information
from each of the modules to develop equations

26
00:01:44,649 --> 00:01:49,009
for the carry-out from the pair of modules
treated as a single block.

27
00:01:49,009 --> 00:01:54,950
We'll generate a carry-out from the block
when a carry-out is generated by the H module,

28
00:01:54,950 --> 00:02:01,460
or when a carry-out is generated by the L
module and propagated by the H module.

29
00:02:01,460 --> 00:02:06,399
And we'll propagate the carry-in through the
block only if the L module propagates its

30
00:02:06,399 --> 00:02:12,970
carry-in to the intermediate carry-out and
H module propagates that to the final carry-out.

31
00:02:12,970 --> 00:02:19,870
So we have two simple equations requiring
only a couple of logic gates to implement.

32
00:02:19,870 --> 00:02:26,450
Let's use these equations to build a generate-propagate
(GP) module and hook it to the H and L modules

33
00:02:26,450 --> 00:02:27,980
as shown.

34
00:02:27,980 --> 00:02:32,790
The G and P outputs of the GP module tell
us under what conditions we'll get a carry-out

35
00:02:32,790 --> 00:02:38,610
from the two individual modules treated as
a single, larger block.

36
00:02:38,610 --> 00:02:43,120
We can use additional layers of GP modules
to build a tree of logic that computes the

37
00:02:43,120 --> 00:02:48,099
generate and propagate logic for adders with
any number of inputs.

38
00:02:48,099 --> 00:02:53,780
For an adder with N inputs, the tree will
contain a total of N-1 GP modules and have

39
00:02:53,780 --> 00:02:56,909
a latency that's order log(N).

40
00:02:56,909 --> 00:03:00,659
In the next step, we'll see how to use the
generate and propagate information to quickly

41
00:03:00,659 --> 00:03:06,569
compute the carry-in for each of the original
full adder modules.

42
00:03:06,569 --> 00:03:11,379
Once we're given the carry-in C_0 for the
low-order bit, we can hierarchically compute

43
00:03:11,379 --> 00:03:14,819
the carry-in for each full adder module.

44
00:03:14,819 --> 00:03:19,410
Given the carry-in to a block of adders, we
simply pass it along as the carry-in to the

45
00:03:19,410 --> 00:03:21,200
low-half of the block.

46
00:03:21,200 --> 00:03:25,239
The carry-in for the high-half of the block
is computed the using the generate and propagate

47
00:03:25,239 --> 00:03:28,280
information from the low-half of the block.

48
00:03:28,280 --> 00:03:33,900
We can use these equations to build a C module
and arrange the C modules in a tree as shown

49
00:03:33,900 --> 00:03:40,140
to use the C_0 carry-in to hierarchically
compute the carry-in to each layer of successively

50
00:03:40,140 --> 00:03:44,980
smaller blocks, until we finally reach the
full adder modules.

51
00:03:44,980 --> 00:03:51,180
For example, these equations show how C4 is
computed from C0, and C6 is computed from

52
00:03:51,180 --> 00:03:53,280
C4.

53
00:03:53,280 --> 00:03:58,269
Again the total propagation delay from the
arrival of the C_0 input to the carry-ins

54
00:03:58,269 --> 00:04:02,379
for each full adder is order log(N).

55
00:04:02,379 --> 00:04:09,209
Notice that the G_L and P_L inputs to a particular
C module are the same as two of the inputs

56
00:04:09,209 --> 00:04:13,110
to the GP module in the same position in the
GP tree.

57
00:04:13,110 --> 00:04:18,668
We can combine the GP module and C module
to form a single carry-lookahead module that

58
00:04:18,668 --> 00:04:23,940
passes generate and propagate information
up the tree and carry-in information down

59
00:04:23,940 --> 00:04:25,130
the tree.

60
00:04:25,130 --> 00:04:30,430
The schematic at the top shows how to wire
up the tree of carry-lookahead modules.

61
00:04:30,430 --> 00:04:33,970
And now we get to the payoff for all this
hard work!

62
00:04:33,970 --> 00:04:39,060
The combined propagation delay to hierarchically
compute the generate and propagate information

63
00:04:39,060 --> 00:04:45,370
on the way up and the carry-in information
on the way down is order log(N),

64
00:04:45,370 --> 00:04:50,690
which is then the latency for the entire adder
since computing the sum outputs only takes

65
00:04:50,690 --> 00:04:53,680
one additional XOR delay.

66
00:04:53,680 --> 00:04:59,020
This is a considerable improvement over the
order N latency of the ripple-carry adder.

67
00:04:59,020 --> 00:05:04,639
A final design note: we no longer need the
carry-out circuitry in the full adder module,

68
00:05:04,639 --> 00:05:07,169
so it can be removed.

69
00:05:07,169 --> 00:05:12,740
Variations on this generate-propagate strategy
form the basis for the fastest-known adder

70
00:05:12,740 --> 00:05:13,870
circuits.

71
00:05:13,870 --> 00:05:18,000
If you'd like to learn more, look up "Kogge-Stone
adders" on Wikipedia.