

# EECS 151/251A Final Review Session

# Topics in Scope

- Multipliers (array multipliers, wallace tree, booth recoding)
- Flip-flop and latch circuits
- Timing (setup/hold margins, skew, jitter)
- SRAM (read-stability, write-ability, read/write times, cell sizing)
- Caches (direct mapped, N-way set associative, fully associative)
- DRAM, FIFOs
- H-trees, clock distribution
- Parallelism and pipelining (performance, power, area tradeoffs)
  
- And everything before MT2

# Timing

# Fa14 Final #3

3. [12pts (16pts EE241A)] Timing.

In this problem we will be examining the pipeline shown below. The minimum and maximum delays through the logic are annotated on the figure, and the flip-flops have the following properties:  $t_{clk-q} = 40\text{ps}$ ,  $t_{setup} = 60\text{ps}$ , and  $t_{hold} = 50\text{ps}$ .



- (a) (3pts) Determine the minimum cycle time assuming all clocks are ideal ( $\underline{clk1 = clk2 = clk3 = clk}$ ).

# Fa14 Final #3

3. [12pts (16pts EE241A)] Timing.

In this problem we will be examining the pipeline shown below. The minimum and maximum delays through the logic are annotated on the figure, and the flip-flops have the following properties:  $t_{clk-q} = 40\text{ps}$ ,  $t_{setup} = 60\text{ps}$ , and  $t_{hold} = 50\text{ps}$ .



$$t_{clk-q} + \underbrace{\max(t_{p,max_{1,2,3}})}_{t_{p,max_3}} + t_{-setup} \leq T_{min}$$

$$T_{min} = \underline{1100\text{ps}}$$

# Fa14 Final #3



### 3. [12pts (16pts EE241A)] Timing.

In this problem we will be examining the pipeline shown below. The minimum and maximum delays through the logic are annotated on the figure, and the flip-flops have the following properties:  $t_{clk-q} = 40\text{ps}$ ,  $t_{setup} = 60\text{ps}$ , and  $t_{hold} = 50\text{ps}$ .



- (b) (4pts) What is the worst-case minimum cycle time under clock uncertainty(give an expression and label the uncertainty on each clock that produces this worst case)? Assume each of the clocks has an independent, random uncertainty  $\pm \Delta$ .

# Fa14 Final #3

3. [12pts (16pts EE241A)] Timing.

In this problem we will be examining the pipeline shown below. The minimum and maximum delays through the logic are annotated on the figure, and the flip-flops have the following properties:  $t_{clk-q} = 40\text{ps}$ ,  $t_{setup} = 60\text{ps}$ , and  $t_{hold} = 50\text{ps}$ .



$$\Delta_2 + t_{clk-q} + t_{r,max_3} + t_{selq} \leq \Delta_3 + T_{min}$$

$$\Delta_2 - \Delta_3 + t_{clk-q} + t_{r,max_3} + t_{selq} \leq T_{min}$$

$$2\Delta + t_{clk-q} + t_{r,max_3} + t_{selq} \leq T_{min,wc}$$

Worst-case minimum cycle time:  $T_{min,wc} = \underline{\hspace{10em}}$

$$T_{min,wc} = 1100_{fs} + 2\Delta$$

# Fa14 Final #3

3. [12pts (16pts EE241A)] Timing.

In this problem we will be examining the pipeline shown below. The minimum and maximum delays through the logic are annotated on the figure, and the flip-flops have the following properties:  $t_{clk-q} = 40\text{ps}$ ,  $t_{setup} = 60\text{ps}$ , and  $t_{hold} = 50\text{ps}$ .



- (c) (5pts) Determine the maximum tolerable  $\Delta$  before any hold times violations occur.

$t_{shew}$



# Fa14 Final #3

3. [12pts (16pts EE241A)] Timing.

In this problem we will be examining the pipeline shown below. The minimum and maximum delays through the logic are annotated on the figure, and the flip-flops have the following properties:  $t_{clk-q} = 40\text{ps}$ ,  $t_{setup} = 60\text{ps}$ , and  $t_{hold} = 50\text{ps}$ .



$$\Delta + t_{clk-q} + \min(t_{p,min_{1+2,3}}) \geq t_{hold} + \Delta$$

$$t_{clk-q} + \min(t_{p,min_{1+2,3}}) - 2\Delta \geq t_{hold}$$

$$\Delta \leq \frac{t_{clk-q} + \min(t_{p,min_{1+2,3}}) - t_{hold}}{2}$$

$$\Delta_{max} = \underline{\hspace{2cm}} \quad (5\text{fs})$$

# Sp17 Final #2

## PROBLEM 2: Timing (16 points)

In this problem we will be examining the pipeline shown below. The minimum and maximum delays through the logic are annotated on the figure, and the flip-flops have the following properties:  $t_{clk-q} = 50\text{ps}$ ,  $t_{setup} = 25\text{ps}$ , and  $t_{hold} = 40\text{ps}$ . You can assume that the clock has no jitter, but  $t_{skew1}$  and  $t_{skew2}$  can be either positive or negative.



- a) (4 pts) What is the minimum clock cycle time if  $t_{skew1} = 100\text{ps}$  and  $t_{skew2} = 50\text{ps}$ ?

# Sp17 Final #2

## PROBLEM 2: Timing (16 points)

In this problem we will be examining the pipeline shown below. The minimum and maximum delays through the logic are annotated on the figure, and the flip-flops have the following properties:  $t_{clk-q} = 50\text{ps}$ ,  $t_{setup} = 25\text{ps}$ , and  $t_{hold} = 40\text{ps}$ . You can assume that the clock has no jitter, but  $t_{skew1}$  and  $t_{skew2}$  can be either positive or negative.



- a) (4 pts) What is the minimum clock cycle time if  $t_{skew1} = 100\text{ps}$  and  $t_{skew2} = 50\text{ps}$ ?

Notice that  $CL_2$  has  $>100\text{ps}$  more delay than the other two paths, and skew values here mean that  $CL_2$  gets  $100\text{ps} - 50\text{ps}$  less time to propagate than the clock cycle.

$$\text{So: } T_{cyc,min} - (t_{skew1} - t_{skew2}) = t_{clk_{q3}} + t_{min,CL2} + t_{j,stop}$$

$$T_{cyc,min} - 50\text{ps} = 50\text{ps} + 800\text{ps} + 25\text{ps}$$

$$\boxed{T_{cyc,min} = 925\text{ps}}$$

$$\begin{aligned}
 & 800\text{ps} + 100\text{ps} \\
 & + 50\text{ps} + 25\text{ps} \\
 & = 925\text{ps}
 \end{aligned}$$

# Sp17 Final #2

$$t_{c-q} = 50 \text{ ps} \quad t_{\text{hold}} = 40 \text{ ps}$$



- b) (4 pts) Assuming  $t_{\text{skew1}}$  is fixed at 100ps, how positive can  $t_{\text{skew2}}$  be before this pipeline (repeated above for your convenience) fails a hold-time constraint?

100ps

# Sp17 Final #2

$$\frac{t_{\text{skew}_1} + t_{\text{c}_{\text{qz}}} + t_{\text{c}_{\text{l2,mh}}}}{100\text{ps}} > \frac{t_{\text{skew}_2} + \text{thold}}{40}$$



$t_{\text{skew}_2} < 160\text{ps}$

- b) (4 pts) Assuming  $t_{\text{skew}_1}$  is fixed at 100ps, how positive can  $t_{\text{skew}_2}$  be before this pipeline (repeated above for your convenience) fails a hold-time constraint?

$$\text{Check CL}_1: t_{\text{clk},_1} + t_{\text{p,min},_{\text{CL}_1}} \geq t_{\text{skew}_1} + \text{thold}$$

$$50\text{ps} + 100\text{ps} \geq 100\text{ps} + 40\text{ps}$$

$50\text{ps} \geq 40\text{ps} \rightarrow \text{no hold time (doesn't depend on } t_{\text{skew}_2} \text{ anymore)}$

$$\text{Check CL}_2: t_{\text{skew}_1} + t_{\text{clk},_2} + t_{\text{p,min},_{\text{CL}_2}} \geq t_{\text{skew}_2} + \text{thold}$$

$$100\text{ps} + 50\text{ps} + 50\text{ps} \geq t_{\text{skew}_2} + 40\text{ps}$$

$$160\text{ps} \geq t_{\text{skew}_2}$$

$$\text{Check CL}_3: t_{\text{skew}_2} + t_{\text{clk},_3} + t_{\text{p,min},_{\text{CL}_3}} \geq \text{thold}$$

$t_{\text{skew}_2} + 50\text{ps} + 75\text{ps} \geq 40\text{ps} \leftarrow \text{no hold time for any positive value of } t_{\text{skew}_2}$

# Sp17 Final #2



- c) (8 pts) If you could intentionally set the values of  $t_{skew1}$  and  $t_{skew2}$ , what values would you choose in order to minimize the cycle time of this pipeline (again repeated above)? What would be the cycle time in this case?

$$\begin{aligned} T_{C,min} &= 50 + 800 + 25 - 60 \\ &= 815 \text{ ps} \end{aligned}$$

# Sp17 Final #2

$CL_2$  has the largest delay (by 100ps), so we want to give it more time by making  $t_{skew_2} > t_{skew_1}$ . However, when  $t_{skew_1}$  was 100ps, we already saw in part b) that we couldn't make  $t_{skew_2} > 160ps$  - in other words, we can give  $CL_2$  at most 60ps of extra time before we run into a hold time violation. Since both  $CL_1$  and  $CL_3$  will still be less critical than  $CL_2$  in this case, we can just choose  $t_{skew_1} = 0ps$  and  $t_{skew_2} = 60ps$ , so that:

$$T_{cyc, \min} + t_{skew_2} = t_{clk-q} + t_{p, \max, CL_2} + t_{setup}$$

$$T_{cyc, \min} = 50ps + 800ps + 25ps - 60ps$$

$$= 815ps$$

# Fa14 Final #7

7. [14pts (18pts EE241A)] SRAM Design.

For this problem we will be designing a  $N_{WL} \times N_{BL}$  SRAM in 32nm CMOS (i.e., each wordline drives  $N_{BL}$  cells, and each bitline has  $N_{WL}$  cells on it), with each cell shown below. The cells layout is  $0.2\mu m$  tall and  $1\mu m$  wide.  $N_{WL} = 64$ , and  $N_{BL} = 128$ ,  $V_{dd} = 1V$ ,  $C_G = C_D = 1fF/\mu m$ , and for the wordline and bitline wires,  $C_w = 0.2fF/\mu m$ .



12

- (a) (4pts) Draw the block-level diagram of the SRAM (decoder, array, etc.) to enable byte-size reads and writes into the array. Label the address bits.

# Fa14 Final #7

- (a) (4pts) Draw the block-level diagram of the SRAM (decoder, array, etc.), to enable byte-size reads and writes into the array. Label the address bits.



# Fa14 Final #7



7. [14pts (18pts EE241A)] SRAM Design.

For this problem we will be designing a  $N_{WL} \times N_{BL}$  SRAM in 32nm CMOS (i.e., each wordline drives  $N_{BL}$  cells, and each bitline has  $N_{WL}$  cells on it), with each cell shown below. The cells layout is  $0.2\mu m$  tall and  $1\mu m$  wide.  $N_{WL} = 64$ , and  $N_{BL} = 128$ ,  $V_{dd} = 1V$ ,  $C_G = C_D = 1fF/\mu m$ , and for the wordline and bitline wires,  $C_w = 0.2fF/\mu m$ .

- (b) (4pts) Draw the equivalent circuit for a read operation, replacing the cell transistors with equivalent resistor values. (You can assume that during the read upset, the bitline behaves as a supply with voltage  $V_{dd}$  due to its large capacitance).

Assuming that the NMOS pull-down transistor in the SRAM cell is  $W_{N2}=120nm$ , determine the maximum size of the NMOS access transistor  $W_{N1}$  so that the cell read upset is not more than 35% of  $V_{dd}$ .

# Fa14 Final

7. [14pts (18pts EE241A)] SRAM Design.

For this problem we will be designing a  $N_{WL} \times N_{BL}$  SRAM in 32nm CMOS (i.e., each wordline drives  $N_{BL}$  cells, and each bitline has  $N_{WL}$  cells on it), with each cell shown below. The cells layout is  $0.2\mu m$  tall and  $1\mu m$  wide.  $N_{WL} = 64$ , and  $N_{BL} = 128$ ,  $V_{dd} = 1V$ ,  $C_G = C_D = 1fF/\mu m$ , and for the wordline and bitline wires,  $C_w = 0.2fF/\mu m$ .

$$\frac{R_2}{R_1 + R_2} V_{dd} \leq 0.35 V_{dd}$$

$$R_2 = \frac{K}{w_{N2}} \quad 0.35 V_{dd}$$

$$\frac{\frac{K}{w_{N2}}}{\frac{K}{w_{N1}} + \frac{K}{w_{N2}}} \leq 0.35 \Rightarrow \frac{w_{N1}}{w_{N1} + w_{N2}} \leq 0.35$$

$$w_{N1} \leq \frac{0.35}{1 - 0.35} w_{N2}$$

- $W_{N1} \leq 64 \text{ nm}$

# Fa14 Final #7

7. [14pts (18pts EE241A)] SRAM Design.

For this problem we will be designing a  $N_{WL} \times N_{BL}$  SRAM in 32nm CMOS (i.e., each wordline drives  $N_{BL}$  cells, and each bitline has  $N_{WL}$  cells on it), with each cell shown below. The cells layout is  $0.2\mu m$  tall and  $1\mu m$  wide.  $N_{WL} = 64$ , and  $N_{BL} = 128$ ,  $V_{dd} = 1V$ ,  $C_G = C_D = 1fF/\mu m$ , and for the wordline and bitline wires,  $C_w = 0.2fF/\mu m$ .

$$C_{BL} = C_{Wire} + C_d + C_{inv}$$

$$R = R_1 + R_2$$

- (c) (3pts) Calculate the read delay from the wordline voltage pulled to  $V_{dd}$  to the bitline voltage dropping to  $V_{dd}/2$ . Assume that the bitline is loaded by an inverter read-out circuit with input capacitance of  $1fF$ . Assume that  $W_{N1} = 50nm$  and  $R_N = 2k\Omega \cdot \mu m$ . (Neglect the internal capacitance of the SRAM cell.)

Draw the equivalent RC circuit of the SRAM read path. Label and calculate all capacitances.



Fa14 Fi

- (c) (3pts) Calculate the read delay from the wordline voltage pulled to  $V_{dd}$  to the bitline voltage dropping to  $V_{dd}/2$ . Assume that the bitline is loaded by an inverter read-out circuit with input capacitance of 1fF. Assume that  $W_{N1} = 50\text{nm}$  and  $R_N = 2\text{k}\Omega \cdot \mu\text{m}$ . (Neglect the internal capacitance of the SRAM cell.)

Draw the equivalent RC circuit of the SRAM read path. Label and calculate *all* capacitances.

Draw the equivalent RC circuit of the SRAM read path. Label and calculate *all* capacitances.



$$C_{BL} = C_w + C_D + C_{inv} = 64 \cdot \left( 0.2 \frac{\mu\text{m}}{\mu\text{m}} \cdot 0.2 \frac{\text{fF}}{\mu\text{m}} + \frac{1}{\frac{\mu\text{m}}{\mu\text{m}}} \cdot 0.05 \frac{\mu\text{m}}{\mu\text{m}} \right) + 1 \text{fF}$$

$$C_{BL} = 6.76 \text{ fF}$$

$$R_1 + R_2 = R_N \cdot \left( \frac{1}{W_{N1}} + \frac{1}{W_{N2}} \right) = 56.7 \text{ k}\Omega$$

Write the formula for the read-delay and calculate the numeric value below:

$$t_{d-Read} = \ln 2 \cdot (C_{BL} \cdot (R_1 + R_2)) \approx 264 \text{ ps}$$

# Fa14 Final #8

## 8. [10pts] Cache Design.

32



Consider the design of a *direct mapped* cache for a computer system with 16-bit data words and 16-bit byte-addresses. The cache stores 256 bytes of data, in 32 blocks.



In the space below draw the circuit for the datapath part of the cache (you don't need to design the controller). Your cache should implement the interface shown above (you don't need to design the DRAM interface). For your design you can use simple logic gates, multiplexers, comparitors, and 8x32 SRAM blocks as shown above:



# **SRAM**

# Fa18 Final #4

Consider the 8T SRAM cell shown below, which adds a new port for accessing the standard cross-coupled inverter pair.



The write wordline (WWL) and write bitlines (BLL and BLR) are used for writing into the cell, while the read wordline (RWL) and read bitline (RBL) are used for reading out of the cell. The cell's layout is 1  $\mu\text{m}$  tall and 4  $\mu\text{m}$  wide. Assume that  $V_{DD} = 1V$ ,  $C_G = C_D = 1 \text{ fF}/\mu\text{m}$ , and that bitlines and wordlines have capacitance  $C_w = 0.2 \text{ fF}/\mu\text{m}$ .

- a) Discuss qualitatively the advantage of this design over the standard 6T SRAM cell, and what problem it solves.  
Compared to the 6T SRAM cell, which transistor (or pair of transistors) can be made smaller? **(2 Pts)**

# Fa18 Final #4

- a) Discuss qualitatively the advantage of this design over the standard 6T SRAM cell, and what problem it solves. Compared to the 6T SRAM cell, which transistor (or pair of transistors) can be made smaller? **(2 Pts)**

The read and write operations are decoupled, as the read operation can no longer affect the contents of the cell. The write operation stays the same. Thus, the cross-coupled inverters need to be sized for writability only.

Since the internal pull-down transistors no longer need to be stronger than the access transistors for read stability (in a 6T cell  $W_{PDL} > W_{PGL}$  and  $W_{PDR} > W_{PGR}$ ), PDL and PDR can be made as small as possible.

# Fa18 Final #4

- b) Draw a block diagram for a 1024-bit SRAM implemented using an array of the 8T SRAM cells, showing the array and the necessary periphery required for **read accesses only**. The read/write word size should be 8 bits, and the array should have a square aspect ratio (meaning that the overall array built from tiled SRAM cells has equal width and height). Mark the bit widths on all of the wires, and label the address bits. **(3 Pts)**

# Fa18 Final #4

- b) Draw a block diagram for a 1024-bit SRAM implemented using an array of the 8T SRAM cells, showing the array and the necessary periphery required for **read accesses only**. The read/write word size should be 8 bits, and the array should have a square aspect ratio (meaning that the overall array built from tiled SRAM cells has equal width and height). Mark the bit widths on all of the wires, and label the address bits. (3 Pts)

Since the cell is  $1 \times 4$ , we want 4 times as many rows as columns.

$1024 = 2^{10} = 2^6 * 2^4$ , so we want  $2^6$  rows and  $2^4$  columns.

Thus there are  $2^6$  read wordlines decoded from 6 bits of address  $A[6:1]$ . There are  $2^4$  bitlines coming out of the SRAM, which need to be muxed down using 8 2:1 muxes driven by address bit  $A[0]$ .



# Fa18 Final #4

- c) Determine the energy taken from the supply to precharge the read bitlines (RBL) after every read cycle. For this part, assume that there are 32 rows of cells per bitline, and 32 columns of cells per wordline. The width of the read access transistor (RPG) is 120 nm. Assume that a skewed inverter which inverts when  $V_{in} = 0.8V_{DD}$  is used to read the bitline voltage, and that the bitline is precharged back to  $V_{DD}$  as soon as the inverter flips. Assume that each cell in the SRAM has a 60% probability of holding a 1 on BLRI (the output of the right inverter) and a 40% probability of holding a 0. **(3 Pts)**

# Fa18 Final #4

- c) Determine the energy taken from the supply to precharge the read bitlines (RBL) after every read cycle. For this part, assume that there are 32 rows of cells per bitline, and 32 columns of cells per wordline. The width of the read access transistor (RPG) is 120 nm. Assume that a skewed inverter which inverts when  $V_{in} = 0.8V_{DD}$  is used to read the bitline voltage, and that the bitline is precharged back to  $V_{DD}$  as soon as the inverter flips. Assume that each cell in the SRAM has a 60% probability of holding a 1 on BLRI (the output of the right inverter) and a 40% probability of holding a 0. (3 Pts)

$$C_{BL} = 32 * \left( 1\mu m * \frac{0.2fF}{\mu m} + 0.120\mu m * \frac{1fF}{\mu m} \right) = 10.24 fF$$

$$C_{total} = \alpha_1 * 32 * C_{BL} = \alpha_1 * 327.68 fF$$

$$E_{total} = C_{total} * V_{DD} * (V_{DD} - 0.8V_{DD}) = \alpha_1 * 66 fJ$$

$$\alpha_1 = 0.6, E_{total} = 39.6 fJ$$

# Boolean Logic + Adders + Elmore Delay

# FA18 Final #1

## [PROBLEM 1] Arithmetic (10 Pts)

A brilliant student fresh from having taken EECS 151/251A comes up with a cool circuit to implement the carry path of an adder that sums two 4-bit inputs, A and B. His idea is sketched in the figure shown.

(in)  $C_0$

$$X_i = \overline{A_i B_i}$$
$$V_o = A \oplus B$$



- a) Derive the logic expressions for the internal signals  $U_i$ ,  $V_i$ , and  $X_i$  as a function of the adder input bits  $A_i$  and  $B_i$  (where  $i = 0 \dots 3$ ) so that the circuit indeed does the right job. (3 Pts)

# FA18 Final #1

$$U_i = A_i \oplus B_i$$

$$V_i = \overline{A_i \oplus B_i}$$

$$X_i = \overline{A_i \cdot B_i}$$



- a) Derive the logic expressions for the internal signals  $U_i$ ,  $V_i$ , and  $X_i$  as a function of the adder input bits  $A_i$  and  $B_i$  (where  $i = 0 \dots 3$ ) so that the circuit indeed does the right job. (3 Pts)

FA18 Final #1

$$Y_i = \bar{A} \cdot \bar{B}$$

**[PROBLEM 1] Arithmetic (10 Pts)**

A brilliant student fresh from having taken EECS 151/251A comes up with a cool circuit to implement the carry path of an adder that sums two 4-bit inputs, A and B. His idea is sketched in the figure shown.

- b) However, you (as equally bright students) quickly figure out that there is something seriously wrong with this circuit. Explain what is wrong, and add the necessary circuitry to fix the problem directly on the following diagram. (2 Pts)



# FA18 Final #1

Problem: Carry bits undefined (floating) under Delete condition. Is easily addressed by adding one more transistor per stage.



$$Y_i = \bar{A}_i \cdot \bar{B}_i$$

- b) However, you (as equally bright students) quickly figure out that there is something seriously wrong with this circuit. Explain what is wrong, and add the necessary circuitry to fix the problem directly on the following diagram. (2 Pts)

# FA18 Final #1

- c) Assume that every transistor (PMOS and NMOS) has the same on-resistance  $R$ , and the following parasitic capacitances  $C_G = C_D = C_S = C$ . For the circuit from part (a), derive an expression for the propagation delay between input  $C_0$  and output  $C_4$ . Hint: Draw the equivalent RC model of the circuit first. (2 Pts)



$$(R/2(22C) + R/2(16C) \dots )$$

# FA18 Final #1

- c) Assume that every transistor (PMOS and NMOS) has the same on-resistance  $R$ , and the following parasitic capacitances  $C_G = C_D = C_S = C$ . For the circuit from part (a), derive an expression for the propagation delay between input  $C_0$  and output  $C_4$ . Hint: Draw the equivalent RC model of the circuit first. (2 Pts)



$$t_p = \ln 2 (R/2 * 18C + R/2 * 13C + R/2 * 8C + R/2 * 3C) = 0.69 \cdot 21 \cdot RC$$

Note: there are 5 transistors connected to each of the carry nodes (with the exception of the last stage). Also the transmission gate is a parallel combination of an NMOS and a PMOS.

# FA18 Final #1

- d) A 4-bit adder is kind of boring. What we really want is a 64-bit adder. Determine how the propagation delay would increase if we effectively extend the proposed circuit to 64 bits. Does the delay increase linearly with the word-length? Explain. (2 Pts)



$\left( \frac{1}{T} \right) 6(1)$

# FA18 Final #1

- d) A 4-bit adder is kind of boring. What we really want is a 64-bit adder. Determine how the propagation delay would increase if we effectively extend the proposed circuit to 64 bits. Does the delay increase linearly with the word-length? Explain. (2 Pts)



As a distributed RC-line, the delay is **quadratic** in the number of bits.

$$\begin{aligned} t_p &= 0.38 ((64, R/2) . (64.6C)) \quad (\text{Note: this expression ignore the termination effect -- which is minor}) \\ &= 0.38 . 64^2 . (3RC) \end{aligned}$$

# FA18 Final #1

- e) Propose a simple way to minimize the carry delay of the 64-bit adder (without changing the basic idea). A qualitative answer is sufficient. **Hint:** you realize by now that the carry chain looks like a distributed RC chain. (1 Pt)



# FA18 Final #1



- e) Propose a simple way to minimize the carry delay of the 64-bit adder (without changing the basic idea). A qualitative answer is sufficient. **Hint:** you realize by now that the carry chain looks like a distributed RC chain. (1 Pt)



Repeaters are the best way to break the quadratic dependence of a distributed RC-line on the number of stages. Introducing a buffer, for instance every 4 stages, would help a lot to reduce the delay. The periodicity of the buffer insertion depends upon driving resistance and extra capacitance of the buffer,

# Clock Distribution + Elmore Delay

# FA18 Final #3

## [PROBLEM 3] Wires and Clocks (8 pts)

You are asked to design the clock distribution network for a  $3\text{mm} \times 4\text{mm}$  microprocessor chip shown below. The clock signal input pin (ClkIn) is located at the bottom left corner and driven by an ideal source, and the clock driver D is feeding BlockA and BlockB through WireA and WireB, respectively. The driver has an output capacitance  $C_{inv} = 60\text{fF}$  and resistance  $R_{inv} = 100\Omega$ . For the wires, assume that  $C_{wpp}=0.1\text{fF}/\mu\text{m}^2$ ,  $C_{wfringe}=0.05\text{fF}/\mu\text{m}/\text{edge}$ , and  $R_{sqw}=0.1\Omega/\mu\text{m}$ .

- a) Assuming a  $\Pi$  wire model, draw the equivalent circuit and find the delay from ClkIn to the clock input of BlockA. An analytical expression is sufficient – no need to calculate a numerical answer. (3 Pts)



# FA18 Final #3

## [PROBLEM 3] Wires and Clocks (8 pts)

You are asked to design the clock distribution network for a 3mm x 4mm microprocessor chip shown below. The clock signal input pin (ClkIn) is located at the bottom left corner and driven by an ideal source, and the clock driver D is feeding BlockA and BlockB through WireA and WireB, respectively. The driver has an output capacitance  $C_{inv} = 60fF$  and resistance  $R_{inv} = 100\Omega$ . For the wires, assume that  $C_{wpp}=0.1fF/\mu m^2$ ,  $C_{wfringe}=0.05fF/\mu m/\text{edge}$ , and  $R_{sqw}=0.1\Omega/\mu m$ .

- a) Assuming a  $\Pi$  wire model, draw the equivalent circuit and find the delay from ClkIn to the clock input of BlockA. An analytical expression is sufficient – no need to calculate a numerical answer. (3 Pts)

$$t_{clk-A} = \ln 2 \left( R_{inv} (C_{inv} + C_{WA} + C_{inA} + C_{WB} + C_{inB}) + R_{WA} \left( \frac{C_{WA}}{2} + C_{inA} \right) \right)$$



# FA18 Final #3

## [PROBLEM 3] Wires and Clocks (8 pts)

You are asked to design the clock distribution network for a 3mm x 4mm microprocessor chip shown below. The clock signal input pin (ClkIn) is located at the bottom left corner and driven by an ideal source, and the clock driver D is feeding BlockA and BlockB through WireA and WireB, respectively. The driver has an output capacitance  $C_{inv} = 60fF$  and resistance  $R_{inv} = 100\Omega$ . For the wires, assume that  $C_{wpp}=0.1fF/\mu m^2$ ,  $C_{wfringe}=0.05fF/\mu m/\text{edge}$ , and  $R_{sqw}=0.1\Omega/\mu m$ .

- b) Find an expression for the clock skew between BlockA and BlockB and calculate its numerical value. (2 Pts)



# FA18 Final #3

## [PROBLEM 3] Wires and Clocks (8 pts)

You are asked to design the clock distribution network for a 3mm x 4mm microprocessor chip shown below. The clock signal input pin (ClkIn) is located at the bottom left corner and driven by an ideal source, and the clock driver D is feeding BlockA and BlockB through WireA and WireB, respectively. The driver has an output capacitance  $C_{inv} = 60fF$  and resistance  $R_{inv} = 100\Omega$ . For the wires, assume that  $C_{wpp}=0.1fF/\mu m^2$ ,  $C_{wfringe}=0.05fF/\mu m/\text{edge}$ , and  $R_{sqw}=0.1\Omega/\mu m$ .

- b) Find an expression for the clock skew between BlockA and BlockB and calculate its numerical value. (2 Pts)

$$C_{WA} = W_A L_A C_{wpp} + 2L_A C_{wfringe} = 360fF$$

$$C_{WB} = W_B L_B C_{wpp} + 2L_B C_{wfringe} = 480fF$$

$$R_{WA} = R_{sqw} \cdot \frac{L_A}{W_A} = 1.5k\Omega$$

$$R_{WB} = R_{sqw} \cdot \frac{L_B}{W_B} = 2k\Omega$$

$$t_{clk-B} = \ln 2 \left( R_{inv} (C_{inv} + C_{WA} + C_{inA} + C_{WB} + C_{inB}) + R_{WB} \left( \frac{C_{WB}}{2} + C_{inB} \right) \right)$$

$$t_{skew} = |t_{clk-A} - t_{clk-B}| \in \ln 2 \left( R_{WA} \frac{C_{WA}}{2} - R_{WB} \frac{C_{WB}}{2} + R_{WA} C_{inA} - R_{WB} C_{inB} \right)$$

$$t_{skew} = 138ps$$



# FA18 Final #3

- c) Now we place the driver somewhere along the diagonal between BlockA and BlockB in order to minimize the total wire length it drives ( $L_A+L_B$ ). Where on the diagonal would you place the clock driver D so that the clock skew is minimized? Provide your answer as a linear distance from BlockA. Determine the resulting clock skew. **(3 Pts)**



# FA18 Final #3

- c) Now we place the driver somewhere along the diagonal between BlockA and BlockB in order to minimize the total wire length it drives ( $L_A+L_B$ ). Where on the diagonal would you place the clock driver D so that the clock skew is minimized? Provide your answer as a linear distance from BlockA. Determine the resulting clock skew. (3 Pts)

Assuming  $L_A=x$  and  $L_B=5-x$ :

$$t_{skew} = |t_{clk-A} - t_{clk-B}| = \ln 2 \left( R_{WA} \frac{C_{WA}}{2} - R_{WB} \frac{C_{WB}}{2} + R_{WA} C_{inA} - R_{WB} C_{inB} \right) = 0$$

$$\ln 2 \left( R_{sqw} \cdot \frac{x}{W_A} \left( \frac{W_A}{2} x C_{wpp} + x C_{wfringe} + C_{inA} \right) - R_{sqw} \cdot \frac{(5-x)}{W_B} \left( \frac{W_B}{2} (5-x) C_{wpp} + (5-x) C_{wfringe} + C_{inB} \right) \right) = 0$$

$$x = 2.43mm \Rightarrow L_A = 2.43mm, L_B = 2.57mm$$

$$t_{skew} = 0$$

