## **MASSACHUSETTS INSTITUTE OF TECHNOLOGY** Department of Electrical Engineering and Computer Science

## 6.374: Analysis and Design of Digital Integrated Circuits Problem Set # 5 Solutions

Fall 2003

Issued: 11/13/03

# For these problems you can use the process parameters for the 0.25 technology- see the Process Parameters file in the assignments section.

## **Problem 1: Transmission Gate Register Design**

Having mastered the art of register and latch design you are faced with the following problem. Your manager asks you to design a "Reduced Clock Load Transmission Gate Register". You look up your 6.374 Bible, and Bingo! You have it right there in Handout #7, Slide 24. Good thing you took the class, at least you have the schematic to begin with :)



a) What type of register is this? Briefly explain how it works.

#### Solution

This is a positive edge triggered register. When CLK=0, the first stage (master) is transparent and the second stage (slave) is in the hold mode. When CLK=1 the situation is reversed. Therefore, data is sampled on the positive edge of the clock.

b) Your problem is to *size* the inverters and transmission gates. Assume a supply voltage of 2.5V and fully static inverters. Simulate the circuit in HSPICE with the input waveforms shown in the figure. Assume negligible rise and fall times for the CLK and CLK signals and no skew between them. To begin with assume all NMOS devices to be minimum sized and all PMOS devices to be 3 times the NMOS devices. Assume Q is 0V initially. Does your circuit work? (There goes my raise!). Turn in plots showing input waveforms along

with D and Q signals.

## Solution

The circuit does NOT work with identically sized subckt elements (i.e. where all PMOS devices are 3x the minimum sized NMOS devices within each inverter/transmission gate). The reason why the circuit does not work as it should is because when X1 tries to write a value to node 1 which is different from its initial value, there will be a fight between X3 and X1. To make X1 override we have to make the PDN and PUN of X3 sufficiently weak such that node 1 can be brought up/down to the switching threshold of X2. A similar consideration holds for the case when X2 tries to write a value to node 3 which is different from its initial value. Once again we have to make X5 "weaker" compared to X2.



c) Resize the transistors so that the circuit is functional. Point out the changes you have made and explain clearly. Turn in the same plots as in (b) but simulated with modified sizes.

**Hint**: Do not HSPICE the circuit to death. It would be better if you used .SUBCKT macros and tweaked only those sizes that you think will matter. Think before you simulate!

## Solution

The original and modified sizes are as follows. Instead of considering all transistors, we consider the relative sizes of the transmission gates and inverters. Each of these elements have their PMOS device 3 times the size of

the NMOS device. (Minimum sized element => PMOS :  $9\lambda/2\lambda$  and NMOS :  $3\lambda/2\lambda$ .)

|                          | X1 | X2 | X3 | X4 | X5 | TX1 | TX2 |
|--------------------------|----|----|----|----|----|-----|-----|
| Original size (relative) | 1  | 1  | 1  | 1  | 1  | 1   | 1   |
| Modified size (relative) | 2  | 2  | 1  | 2  | 1  | 2   | 2   |



## Problem 2: Edge Triggered Register

Consider the following edge-triggered register. Assume that the clock inputs CLK and  $\overline{CLK}$  have a 0V to  $V_{DD}$  swing. Also assume (for parts a-c) that there is no skew between CLK and  $\overline{CLK}$  (i.e., the inverter delay to derive  $\overline{CLK}$  from CLK is zero). Assume that the rise/fall times on all signals are zero.

a) What type of register is this? (Positive Edge-Triggered Register or Negative Edge-Triggered Register). Explain.

## Solution

Negative Edge-Triggered. Master is transparent and slave is holding when CLK=1. Slave is transparent and master is holding when CLK=0.

b) Assume that the propagation delay of each clocked inverter (e.g.,  $M_I - M_4$ ) is  $T_{CK\_INV}$  and the delay of inverters  $I_I$  and  $I_2$  is  $T_{INV}$ . Derive the expression for the set-up time  $(t_{su})$ , the propagation delay  $(t_{c-q})$  and the hold time  $(t_h)$  in terms of the above parameters. Explain your results.

#### Solution

Setup: Data must go through 1st clocked inverter and I1 so  $t_{su} = T_{ck\_inv} + T_{inv}$ . Propagation delay: Q becomes valid when the data passes through the second clocked inverter so  $t_{c-q} = T_{ck\_inv}$ . Hold time: When CLK goes 1->0, the first clocked inverter is already off, so  $t_H = 0$ .

c) What is the function of transistors  $M_5$ - $M_8$  and  $M_{13}$ - $M_{16}$ ? Is this circuit Ratioed?

## Solution

These FETs implement two clocked inverters. Each clocked inverter is on when its respective stage is holding, so they complete a back-to-back inverter pair that makes the circuit static. The circuit is not ratioed because the inverters turn off during the sample operation, so there is never a fight.



d) Consider the following variation of the circuitin the figure below. If there is a clock overlap, is there a potential problem? If so explain the problem and describe the condition when it happens.

#### Solution

In the 1-1 overlap, D can race through and change Q. This is a problem because the register is supposed to hold. In the 0-0 overlap, D can race through and change Q, but we can fix this with a hold-time constraint.



## **Problem 3: True Single Phase Flip-Flop.**

Consider the True Single Phase Flip-Flop shown here:



Simulate the circuit in HSPICE. The sizes of the devices are given in terms of lamda. Make sure you initialize node B and that you use stimuli given below.

## .ic nB=pvdd

\*nb is the node noted B on the schematic

Vclk clk 0 pulse (0 pvdd 10n 0.5n 0.5n 10n 20n) Vd d 0 pwl(0n 0v 25n 0v 25.5n pvdd 45n pvdd 45.5n 0)

Do you see the glitching at the output? Explain what happens. Change the sizes of 2 transistors and fix the glitch-

ing. Turn in a table with the new sizes and a spice plot showing the new glitch-free flip-flop output. For the corrected flip-flop, measure the setup time using HSPICE and report it in the table as well. As a reminder: AS=AD=W (in µm) x 0.625 µm, PS=PD=W (in µm) + 1.5µm

## Solution

The glitching is caused due to a race condition that is inherent in the given True Single Phase Flip-Flop. To see this race condition consider what happens if D is low. When CLK is low A is precharged high through M2, then when CLK transitions from low to high device M6 turns on which causes A to begin discharging through M4 & M6. In addition, both M9 and M10 will also be on while A is discharging, causing B to initially discharge which in turn causes the output to glitch. Once A has been discharged B will be pulled high through M7. In order to correct this problem we need to resize the pull-down path through M4 and M6 to cause A to discharge much more quickly. With the current sizing the pull-down path through M9 and M10 is much stronger and therefore allows the glitching.

Our approach, used for the graphs provided in the solutions, is to reduce the strength of the M9 and M10 pull-down path by decreasing their widths. Another is to speed up the M4 and M6 pull-down path increasing their widths.

| Technique                | Old Sizes | New Sizes |
|--------------------------|-----------|-----------|
| Slow down pull-down path | M9=8/2    | M9=3/2    |
| through M9 & M10         | M10=8/2   | M10=3/2   |
| Speed up pull-down path  | M4=4/2    | M4=12/2   |
| through M4 & M6          | M6=4/2    | M6=12/2   |



For the corrected flip-flop, the setup time was measured to be 200 ps. Note that you need to measure the setup time for both values of D, and quote the worst case. It is possible for the setup time for D=1 to be negative because the data can still be sampled after CLK goes to 0.



## **Problem 4: Sequential Circuit**

Consider the following sequential circuit. Assume that there is no delay between D and  $\overline{D}$  (i.e., the inverter delay to obtain  $\overline{D}$  from D is 0). Assume that the output is statically held using circuits not shown here (i.e., ignore leakage effects for this problem). Assume that the rise/fall times on all signals are zero.



#### a) Complete the following timing diagram for X and Q.

Assume that the inverter delay is much smaller than the clock period and that appropriate set-up/hold times are met. Assume that each gate  $(I_1, I_2, I_3, \text{NOR}, M_1 - M_4 \text{ and } M_7 - M_{10})$  takes 1 time unit for a low to high or high to low transition. Also assume that it takes 1 time unit to charge node X through  $M_5$  or  $M_6$ . Both the propagation and contamination (i.e., minimum) delay are equal to 1.



b) What is the set-up time for this circuit if glitches on the output Q are acceptable? Explain.

#### Solution

The circuit is a pulse-register. For the circuit to function properly, the value of D to be sampled must be able to propagate to X, then to Q, during the window CLK and CLKDB are both high.

To sample D correctly, X must charge/discharge 1 unit before CLKDB=0 and D must be ready 2 time units before CLKDB=0. Therefore,  $t_{su}$ =-1.

## Problem 5: DEC StrongARM Low Power Edge-Triggered Flip-Flop



The flip-flop shown in the above figure is used in the StrongARM microprocessor developed by Digital Equipment Corporation for the Portable Electronic Device (PED) market. Note that it is fully differential. The following questions will help you understand the operation of this flip-flop. No calculations are necessary for this problem.

a) When clock is low is the flop holding or is it transparent? Why? (2 sentences)

## Solution

The flip-flop is in the hold state while the clock is low as the two PMOS pull-ups are turned on and this pulls the inputs to the cross-coupled NAND gates high, which in turn causes them to hold their previous state.

b) What is the purpose of the shorting transistor connecting nodes L3, L4? (2 sentences)

## Solution

The shorting transistor is used to provide a DC leakage path from either node L3 or L4 to ground. This is necessary when the inputs change their value after the positive edge of CLK has occurred, resulting in either L3 or L4 being left in a high-impedance state with a logical low voltage level stored on the node. Without the leakage path this node would be susceptible to charging by leakage currents through the corresponding PMOS device onto either L1 or L2, as a result the latch could actually change state prior to the next rising edge of CLK! This is best demonstrated by example shown in the following figure.



c) What is the main advantage of this flip-flop from a low power perspective? (1 sentence)

## Solution

The main advantage is that the clock is only connected to 3 MOS devices (2 PMOS + 1 NMOS).

d) What determines the setup time for this flip-flop? Draw a timing diagram showing the timing relationship between the data and the clock. (1 sentence)

#### Solution

This register has no setup time. The master only samples at the clock edge. The hold-time is determined by the amount of time it takes the input sense-amplifier structure to discharge either L3 or L4 (L3 if IN = 1, L4 if IN = 0).

e) From a system perspective, where should this flip-flop be used (i.e., in datapaths for pipelining, as receivers at the end of long buses, as state bits for FSMs,etc.)? Why?(1 sentence)

## Solution

The sensitivity of the input sense-amp, and its very short setup times, makes this flip-flop best suited for use as a receiver at the end of a long bus.

## **Problem 6: Submicron Interconnect Effects.**

Consider the following interconnect circuit.  $C_L$  is the lumped capacitance of each line to ground and  $C_I$  is the inter-wire capacitance. The driver (inverter) is modeled using resistors and an ideal switch. The Switch is ideal and is connected either to the top resistor or the bottom resistor.  $R_I$  is the effective resistance of the interconnect. For this problem, let  $\lambda = \frac{C_I}{C_I}$ .



a) Assume that the initial voltage on line *i* (where i = 0 or 1) is  $V_i^{OLD}$  and the final value after all the transients have settled is  $V_i^{NEW}$ .  $V_i^{OLD}$ ,  $V_i^{NEW} \in \{0, V_{DD}\}$ . Derive an expression for the energy drawn from  $V_{DD}$  through *driver* 0 for an arbitrary transition of the two bit bus.

Solution

$$E_{0}^{drawn} = \int \left( V_{0}^{New} \cdot i \right) dt$$
  
=  $V_{0}^{New} \left( \int C_{L} \frac{dV_{0}}{dt} dt + \int C_{I} \frac{d}{dt} (V_{0} - V_{1}) dt \right)$   
=  $V_{0}^{New} \left( C_{L} \left( V_{0}^{New} - V_{0}^{Old} \right) + C_{I} \left( V_{0}^{New} - V_{0}^{Old} \right) - C_{I} \left( V_{1}^{New} - V_{1}^{Old} \right) \right)$ 

Note 1: if  $V_0^{\text{New}}=0$ , No energy is drawn from the driver 0.

Note 2: It is possible for  $E_0^{drawn} < 0$  if current goes into the power supply. Note total energy drawn must be positive, so  $E_1^{drawn} > 0$ , if  $E_0^{drawn} < 0$ . See this in part (c).

b) Assume that  $\lambda = 0$  for this part. The total energy (i.e., including both drivers) drawn from the power supply for a sequence can be written as  $\eta \cdot C_L \cdot V_{DD}^2$ . Estimate the value of  $\eta$  for the following two sequences.

Sequence A:  $00 \rightarrow 01 \rightarrow 11 \rightarrow 10$ 

#### Solution

00 --> 01,  $E_1=0$ ,  $E_0=C_LV_{DD}^2$ , 01 --> 11,  $E_1=C_LV_{DD}^2$ ,  $E_0=0$ , 11 --> 10,  $E_1=0$ ,  $E_0=0$  $E_{total} = 2C_LV_{DD}^2$ ,  $\eta=2$  Sequence B:  $00 \rightarrow 11 \rightarrow 00 \rightarrow 11$ 

## Solution

00 --> 11,  $E_1 = E_0 = C_L V_{DD}^2$ , 11 --> 00,  $E_1 = E_0 = 0$ , 11 --> 10,  $E_1 = E_0 = C_L V_{DD}^2$  $E_{total} = 4C_L V_{DD}^2$ ,  $\eta = 4$ 

c) Assume that  $\lambda = 3$  for this part. The total energy (i.e., including both drivers) drawn from the power supply for a sequence can be written as  $\eta \bullet C_L \bullet V_{DD}^2$ . Estimate the value of  $\eta$  for the following two sequences.

Sequence A:  $00 \rightarrow 01 \rightarrow 11 \rightarrow 10$ 

## Solution

00 --> 01, 
$$E_1=0$$
,  $E_0=C_L V_{DD}^2 + C_I V_{DD}^2$ ,  
01 --> 11,  $E_1=C_L V_{DD}^2 + C_I V_{DD}^2$ ,  $E_0=-C_I V_{DD}^2$ ,  
11 --> 10,  $E_1=C_I V_{DD}^2$ ,  $E_0=0$   
 $E_{total} = 2(C_L+C_I) V_{DD}^2$ ,  $\eta=8$ 

Sequence B:  $00 \rightarrow 11 \rightarrow 00 \rightarrow 11$ 

#### Solution

Sequence B is exactly the same as in (b) because  $C_I$  never gets charged or discharged,  $\eta = 4$ 

Explain any differences from part (b)

Sequence A in (c) is more because C<sub>I</sub> is being charged or discharged each cycle.

d) For the transition of the bus from 01 to 10, compute the total energy dissipated in the resistors.

## Solution

Using (a)

$$Y_{1} = V_{DD}(C_{L}V_{DD} + C_{I}V_{DD} - C_{I}(0 - V_{DD})) = (C_{L} + 2C_{I})V_{DP}^{2}$$
  
$$E_{0} = 0$$

Energy stored in circuit is the same before and after.

So,  $E_{dissipated} = E_{drawn} - \Delta E_{nergy_{stored}} = 2(C_L + C_l)V_{DD}^2$ ,

## Problem 7: Data-Dependent Logic Swing Internal Bus Architecture (DDL Bus)<sup>1</sup>

Consider the DDL bus architecture for an N bit bus. (Refer to the figures in the lecture notes.)

a) Assuming M bits switched to 0, by what factor do we save power compared to the conventional full swing bus? (remember that there are always 2 bits switching to 0 to provide the "0" and "1" references)

#### Solution

The savings can be computed using the following analysis:

$$E_{conventional} = M \cdot C \cdot (V_{DD})^{2}$$

$$E_{DDL} = (M+2) \cdot C \cdot V_{DD} \cdot V_{swing} = C \cdot \frac{M+2}{M+3} \cdot (V_{DD})^{2} \qquad V_{swing} = \frac{1}{M+3} \cdot V_{DD}$$

$$\therefore \frac{E_{conventional}}{E_{DDL}} = \frac{M+3}{M+2} \cdot M$$

shows the receiver for the DDL bus architecture. (Dual Reference Sense Amplifying Receiver)

b) What is the range of "0" ref? Why are there two "0" refs?

#### Solution

The range of "0" ref is defined by the two boundary cases when no bits switch to 0, and when all bits (i.e., N) switch to 0. The resulting range is thus:

$$\frac{2}{3}V_{DD} < V_{0ref} < \frac{N+2}{N+3} \cdot V_{DD}$$

The reason for having two "0" refs is to generate the "1" ref. The worst case "1" that can occur in the DDL bus is when the adjacent two bus wires switch to 0. This results a voltage drop on the precharged "1" because of the coupling capacitance between the wires. The circuit is providing this worst case by always making two bits switching 0 ("0" ref and "0" ref+) and placing a dummy wire ("1") between them (consult the figure in the lecture notes for further reference).

c) What is the reason for having M2 and M3? Wouldn't it be enough just to have M1 and M4 charge up the nodes A and B to turn the inverter off during precharge?

#### Solution

If we only have M1 and M4, the cross coupled inverter will be turned off by M1 and M4 but the internal nodes out, out won't be at the same voltage level. During the evaluation phase, this voltage difference immediately forces the cross coupled inverter to go to a steady state independent of the current pulled down from nodes A and B, hence the input data. Therefore, precharging the internal nodes out, out is necessary and M2, M3 are needed.

<sup>1.</sup>M. Hiraki, H. Kojima, H.Misawa, T. Akazawa, Y. Hatano, "Data-Dependent Logic Swing Internal Bus Architecture for Ultralow-Power LSI's", IEEE Journal of Solid State Circuits, vol.30, no.4, April 1995 pp. 397-402

The following figure shows the receiver for the DDL bus architecture. (Dual Reference Sense Amplifying Receiver)



Dual reference sense amplifying receiver.