

|                                            |                                                                              |                                             |
|--------------------------------------------|------------------------------------------------------------------------------|---------------------------------------------|
| <b>SOLUTION</b>                            | <b>UC Berkeley EECS151</b><br><b>EECS251A</b><br><b>Fall 2021 Final Exam</b> |                                             |
| Your Name ( <i>first last</i> )            |                                                                              | SID                                         |
| ← <i>Name of person on left (or aisle)</i> | Room _____                                                                   | → <i>Name of person on right (or aisle)</i> |
| <hr/> <i>Lab TA name</i> <hr/>             |                                                                              |                                             |

*Fill in the correct circles & squares completely...like this: ● (select ONE), and ■ (select ALL that apply)*

**Fill in your student ID at the top of every page.**

SID : \_\_\_\_\_

## 1) **FSM (15 points, 15 minutes)**

- a) (5 pts) We would like to design a finite-state machine (FSM) that eliminates single 1s and 0s in a bit pattern. For example if the input pattern is 00100110111, it should be replaced with an output pattern: 00000011111.

An example FSM for this function is shown in Figure 1. It assumes that the pattern always starts with a leading 0. The output is delayed by one cycle from the corresponding input, because it takes a cycle to determine that it is not a single 1 or 0.

In the state transition diagram below, all inputs are labeled, but some outputs are unlabeled. Please complete the diagram.



Figure 1

SID : \_\_\_\_\_

- b) (10 pts) Assume that the states are binary, little-endian encoded  $S_0 = 00$ ,  $S_1 = 01$ ,  $S_2 = 10$ ,  $S_3 = 11$ . Derive the minimal logic that computes the new state  $\{s_1, s_0\}$  from the previous state  $\{ps_1, ps_0\}$  and the single-bit input, in.

|    |  | S1                                |    |    |    |   |
|----|--|-----------------------------------|----|----|----|---|
|    |  | 00                                | 01 | 11 | 10 |   |
| in |  | ps <sub>1</sub> , ps <sub>0</sub> | 0  | 0  | 0  | 1 |
|    |  | 1                                 | 0  | 1  | 0  | 1 |

|    |  | S0                                |    |    |    |   |
|----|--|-----------------------------------|----|----|----|---|
|    |  | 00                                | 01 | 11 | 10 |   |
| in |  | ps <sub>1</sub> , ps <sub>0</sub> | 0  | 0  | 0  | 1 |
|    |  | 1                                 | 1  | 0  | 1  | 0 |

$$s_1(ps_1, ps_0, \text{in}) = \frac{\overline{ps_1} \overline{ps_0} \text{in}}{\overline{ps_1} \overline{ps_0} \text{in} + \overline{ps_1} ps_0 \text{in} + \overline{ps_1} \overline{ps_2} \text{in}}$$

$$s_0(ps_1, ps_0, \text{in}) = \frac{ps_1 \overline{ps_0} \text{in}}{\overline{ps_1} \overline{ps_0} \text{in} + ps_1 ps_0 \text{in} + \overline{ps_1} \overline{ps_2} \text{in}}$$

## 2) Verilog (34 points, 35 minutes)

- a) (4 pts) Are the circuits inferred by the two blocks always equivalent? Mark Yes or No. Incorrect answers carry negative credit.

i)

|                                                         |                                                                            |
|---------------------------------------------------------|----------------------------------------------------------------------------|
| wire [9:0] a, b;<br>wire [10:0] c;<br>assign c = a + b; | wire [9:0] a, b;<br>wire [10:0] c;<br>assign c = \$signed(a) + \$signed(b) |
|---------------------------------------------------------|----------------------------------------------------------------------------|

Yes

No

'signed' and 'unsigned' addition infer the same circuit

ii)

|                                                          |                                                                    |
|----------------------------------------------------------|--------------------------------------------------------------------|
| wire [7:0] a, c;<br>wire [2:0] b;<br>assign c = a >>> b; | wire [7:0] a, c;<br>wire [2:0] b;<br>assign c = \$signed(a) >>> b; |
|----------------------------------------------------------|--------------------------------------------------------------------|

Yes

No

Verilog infers arithmetic right shift *only* if the first operand is signed

iii)

|                             |                             |
|-----------------------------|-----------------------------|
| wire a;<br>assign a = 1'bx; | wire a;<br>assign a = 1'b0; |
|-----------------------------|-----------------------------|

Yes

No

Assigning a wire to 'x' allows the synthesis tool to arbitrarily set the value of the wire

iv)

|                                                                                                                                    |                                                                                                                                                                      |
|------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| wire [1:0] a;<br>reg [3:0] b;<br>always @(*) begin<br>case(a)<br>2'b1x: b = 4;<br>2'b01: b = 3;<br>2'b0x: b = 2;<br>endcase<br>end | wire [1:0] a;<br>reg [3:0] b;<br>always @(*) begin<br>b = 3;<br>if (a == 2'b01)<br>b = 2;<br>else if (a == 2'b00)<br>b = 2;<br>else if (a == 2'b10)<br>b = 4;<br>end |
|------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|

Assume that a case item with an "x" is matched when the case expression is 0 or 1. e.g. 3'b11x matches 3'b110 and 3'b111.

Yes

No

There is a mismatch between the 2 blocks when a = 2'b11 (first block sets b = 4 and second block sets b = 3).

SID : \_\_\_\_\_

b) (2 pts) What circuit does this Verilog synthesize to?

```
wire [1:0] a;  
wire clk;  
reg [1:0] b, c, d;  
  
always @(posedge clk) begin  
    b <= a;  
    c = b + 2'b1;  
    d <= c + 2'b2;  
end
```

Answer: The first circuit. c is assigned using blocking assignment, so the 'new' value of c is used to drive the logic that assigns d instead of the output of the c register.



SID : \_\_\_\_\_

- c) (8 pts) We want to find the minimum unsigned entry in an async read RAM, by iterating through one entry per clock cycle. Fill in the values of the blank spaces to implement the circuit.

```
reg [7:0] ram [0:15]; // Assume the RAM is preloaded with data
reg [(a):0] addr;
reg [(b):0] minimum;

always @(posedge clk) begin
    if (rst)
        addr <= (c);
    else
        addr <= (d);
end

always @(posedge clk) begin
    if (rst)
        minimum <= (e);
    else
        minimum <= (ram[addr] (f) minimum) ? (g) : (h);
end
```

|            |                                                                                      |
|------------|--------------------------------------------------------------------------------------|
| <u>(a)</u> | 3 ( $2^4 = 16$ = the number of entries in the RAM) (other answers won't be accepted) |
| <u>(b)</u> | 7 (each RAM entry is 8 bits wide)                                                    |
| <u>(c)</u> | 0 or 1 or 15 (or anything between 0-15)                                              |
| <u>(d)</u> | addr + 1 or addr - 1                                                                 |
| <u>(e)</u> | ram[0] or 8'hff or ram[any index]                                                    |
| <u>(f)</u> | <                                                                                    |
| <u>(g)</u> | ram[addr]                                                                            |
| <u>(h)</u> | minimum                                                                              |

(f) can also be  $>$ , in which case (g) and (h) will be flipped.

SID : \_\_\_\_\_

- d) (4 pts) Say you have a 1 cycle ALU with 2 N-bit inputs and M distinct operations. You can simulate the ALU using RTL simulation at a speed of C Hz (i.e. you can simulate C clock cycles in 1 second).

How long does it take to exhaustively test the ALU? Give your answer in terms of N, M, and C. Include units in your answer.

Answer:  $(2^N * 2^N * M) / C$  seconds

- e) (2 pts) Complete the waveform for the following circuit:

```
wire a, z;
reg b;
always @(*)
  if (z)
    b = a;
```



b should start low from the very first moment (since z is high and thus a propagates to b).

b should stay low until both z and a go high, at which point b goes high.

b should continue to stay high for the rest of the waveform (since when z drops, b holds the value it was last assigned. when a drops later it has no impact on b).

SID : \_\_\_\_\_

- f) (6 pts) Do the following Verilog snippets describe synthesizable synchronous digital logic? Mark yes or no, and if no, give the reason.

```
reg a, b; // assume a and b are the direct outputs of flip-flops
wire clk;
wire k;
reg s;
assign k = (a == s) ? b : a;
always @(posedge clk) begin
    s <= k;
end
```

Yes       No

If you marked “No”, why? \_\_\_\_\_

```
reg a, b; // assume a and b are the direct outputs of flip-flops
wire clk;
wire k;
reg s;
assign k = (k == s) ? b : a;
always @(posedge clk) begin
    s <= k;
end
```

Yes       No

If you marked “No”, why? Assignment to k forms a combinational loop. Answers that mention self-referential logic or infinite toggling or unstable logic are acceptable.

SID : \_\_\_\_\_

g) (8 pts) A decoder converts an N-bit integer to a  $2^N$ -bit one-hot encoded signal.

i) Fill in the behavioral description of a decoder:

```
wire [7:0] addr;
wire [255:0] dec;
assign dec = (a) << (b);
```

|            |        |            |      |
|------------|--------|------------|------|
| <u>(a)</u> | 256'd1 | <u>(b)</u> | addr |
|------------|--------|------------|------|

ii) Let's design a structural decoder. What is the logic for a 1-bit decoder? Fill in the blanks.

```
wire addr;
wire [1:0] dec;
assign dec[0] = (a);
assign dec[1] = (b);
```

|            |       |            |      |
|------------|-------|------------|------|
| <u>(a)</u> | !addr | <u>(b)</u> | addr |
|------------|-------|------------|------|

iii) How can you build a 2-bit decoder using 1-bit decoders (dec1bit)? Fill in the blanks.

```
wire addr[1:0];
wire [3:0] dec;
wire [1:0] dec1, dec2;
dec1bit d1 (.addr(addr[0]), .dec(dec1));
dec1bit d2 (.addr(addr[1]), .dec(dec2));
assign dec[1:0] = dec1 & (a);
assign dec[3:2] = dec2 & (b);
```

|            |               |            |              |
|------------|---------------|------------|--------------|
| <u>(a)</u> | {2{!addr[1]}} | <u>(b)</u> | {2{addr[1]}} |
|------------|---------------|------------|--------------|

SID : \_\_\_\_\_

h) (6 pts) Does the Verilog snippet exhibit a problem (syntax or elaboration error)? If so, what is the problem?

```
wire a;  
assign a = 1'b1;
```

Problem  No Problem

If there's a problem, what is it? \_\_\_\_\_

```
wire a;  
always @(*) a = 1'b1;
```

Problem  No Problem

If there's a problem, what is it? Can't assign a wire inside an always block

```
reg [1:0] a;  
wire b;  
always @(*)  
  a = b ? 2'b1 : 2'b2;
```

Problem  No Problem

If there's a problem, what is it? The literal 2'b2 is illegal since '2' isn't a binary value

```
wire a, c;  
reg b;  
always @(*)  
  b = a + c;  
always @(*)  
  b = !a;
```

Problem  No Problem

If there's a problem, what is it? 'b' is a multi-driven net which is illegal in synthesized Verilog

SID : \_\_\_\_\_

```
module a(input x, output y);
  always @(*)
    y = !x;
endmodule
```

Problem  No Problem

If there's a problem, what is it? Output nets are by default wires, which can't be assigned in always blocks. The declaration should be 'output reg y'

```
wire a, z, b;
always @(z)
  a = z ? b : !b;
```

Problem  No Problem

If there's a problem, what is it? Cannot assign a (wire) inside an always block. Incomplete sensitivity list is also an acceptable problem to mention.

### **3) CMOS (8 points, 10 minutes)**

The circuit below can be programmed to compute various logic functions of two variables, A and B, by setting the appropriate values for R1, R2, R3, R4. The supply is VDD and all transistors have a threshold voltage  $|V_{Th}|$ .



Figure 3.

a) (2 pts) How many different logic functions can this circuit implement?

$$2^4 = 16$$

Number of functions \_\_\_\_\_ 16 \_\_\_\_\_

b) (4 pts) For which logic values of R1, R2, R3, R4 does this circuit implement  $F = A+B$ .

$$X = \overline{F} = \overline{A + B} = \overline{A} \cdot \overline{B}$$

If  $\overline{A}$  and  $\overline{B}$ , then the bottom center and bottom right NMOS will switch on. So, R4 should be 1.

|    |   |
|----|---|
| R1 | 0 |
| R2 | 0 |
| R3 | 0 |
| R4 | 1 |

c) (2 pts) What is the voltage at node X, if R1, R2, R3, R4, A and B are all at  $V_{DD}$ ?

The top 2 NMOS will be switched on. Specifically the one on the right will drive X with  $V_{DD} - V_{Th}$ .

$$V_X = \underline{\hspace{2cm}} V_{DD} - V_{Th} \underline{\hspace{2cm}}$$

SID : \_\_\_\_\_

#### 4) Fast Circuits (14/18 points, 20/25 minutes)

Consider the following logic path.  $C_{in}$ ,  $a$  and  $b$  are the capacitances of the input pins of their respective gates. Equally sized PMOS and NMOS transistors have equal capacitances and on resistances, and the self-loading capacitance per unit width equals the input capacitance ( $\gamma = C_p/C_g = 1$ ).



- a) (4 pts) What is the path effort from In to Out, and the minimum path delay normalized to the unit inverter?

$$B = \prod b_i = 4 \cdot 1 \cdot 1 = 4$$

$$h^* = \sqrt[3]{H} = \sqrt[3]{216} = 6$$

$$F = \frac{C_L}{C_{in}} = 18$$

$$P = \sum p_i = 1 + 3 + 2 = 6$$

$$G = \prod g_i = 1 \cdot 2 \cdot \frac{3}{2} = 3$$

$$D = \frac{t_{p,gate}}{t_{inv}} = \sum h_i + \sum p_i$$

$$H = BFG = 4 \cdot 18 \cdot 3 = 216$$

$$= 3 \cdot 6 + 6 = 24$$

Path effort:  $H = 216$

Path delay ( $t_{p,gate}/t_{inv}$ ) = 24

- b) (4 pts) Size the gates to minimize the delay from In to Out.

$$h^* = b_i f_i g_i \rightarrow \text{stage 1: } G = 4 \cdot \frac{a}{1} \cdot 1 \rightarrow a = \frac{G}{4} = \frac{3}{2}$$

$$\rightarrow \text{stage 3: } G = 1 \cdot \frac{18}{b} \cdot \frac{3}{2} \rightarrow b = \frac{9}{2}$$

$$\text{check: stage 2: } G = 1 \cdot \frac{b}{a} \cdot 2 \rightarrow G = \frac{9/2}{3/2} \cdot 2 \quad \checkmark$$

Gate sizes:  $a = \frac{3}{2}$

$b = \frac{9}{2}$

SID : \_\_\_\_\_

- c) (4 pts) Suppose you can add as many buffers of any size (buffer = 2 inverters) to this path to minimize the path delay. How many buffers should you add?

$$\text{ideal } * \text{ stages approximation} = \log_4(216) = \frac{\ln(216)}{\ln(4)}$$

↑  
test path delays around this \* of stages ( $N$ )

let  $x = *$  of buffers added

$H = 216$  (unchanged from buffers b/c  $BFG=1$  for inverters)

$P = P_{I-3} + 2x = 6 + 2x$  (original parasitic delay + 1 per inverter)

$$D = N \sqrt[N]{H} + P = N \sqrt[4]{216} + 6 + 2x$$

$$* \text{ buffers} = 0 \rightarrow N = 3 \rightarrow D = 24 \quad (\text{same as before})$$

$$= 1 \rightarrow N = 5 \rightarrow D = 5 \sqrt[4]{216} + 6 + 2 \cdot 1 = 22.65 \leftarrow ** \text{ minimum}$$

$$= 2 \rightarrow N = 7 \rightarrow D = 7 \sqrt[4]{216} + 6 + 2 \cdot 2 = 25.09$$

Number of buffers = 1

- d) (2 pts) To achieve the lowest power design for the minimum delay, the buffers from part c) should be added (mark one):

for low power we want to minimize total gate capacitance

$h_i = b_i f_i g_i = g_i \frac{C_{L,i}}{C_i} \rightarrow$  highest  $C_L$  in design is at output, so select gate with lowest  $g$  (inverter) to drive this output

O after the first inverter

O after the NAND gate

■ after the NOR gate

alternatively: recall that logical effort represents how well a gate can drive an output current, so place inverters before largest load ( $C_L$ )

SID : \_\_\_\_\_

- e) (251A only) (4 pts) You decide to size all the gates the same, with size  $C_{in}=a=b=1$ , and fabricate the circuit, only to realize there is a large parasitic capacitance between the second and the third stage of the circuit. The new diagram is shown below. Calculate the delay of the logic path normalized to the unit inverter.



$$\begin{aligned}
 P &= G \text{ (same as before)} \\
 D &= \sum h_i + P \\
 &= 4 + 12 + 27 + 6 \\
 &= 49
 \end{aligned}$$

Path delay ( $t_{p,gate}/T_{INV}$ ) = 49

SID : \_\_\_\_\_

## 5) Timing (18/24 points, 20/30 minutes)

Consider the following circuit diagram. R0, R1, and R2 are rising-edge triggered flip-flops.

The maximum and minimum combinational delays are listed for each path. You can assume that  $t_{clk-q} = 50\text{ps}$ ,  $t_{su} = 75\text{ps}$ , and  $t_{hold} = 60\text{ps}$ , and the multiplexers have negligible delay. SEL is a single control bit that is stable during any given clock cycle. CLK2 and CLK3 are clock signals derived from CLK1 with some amount of delay, as indicated in Figure 4.



Figure 4.

(a) (6 pts) You may start by assuming that the clock source and clock tree are ideal ( $tsk2$  and  $tsk3$  are both 0). Answer the following questions.

(i) What is the minimum clock period,  $T_{min}$ , that this circuit may operate at correctly?

*Note the architectural false path from the mux select; max path is  $R0 \rightarrow R1$  with  $SEL=1$*

$$\begin{aligned} T_{clk} &\geq t_{clk-q} + t_{L,max} + t_{su} \\ &\geq 50\text{ps} + (200\text{ps} + 300\text{ps}) + 75\text{ps} \\ &\geq 625\text{ps} \end{aligned}$$

$T_{min} = \underline{\hspace{2cm} 625 \hspace{2cm}} \text{ps.}$

(ii) What is the hold slack for this circuit (a negative hold slack value indicates a violation)?

*min path is  $R2 \rightarrow R3$*

$$\begin{aligned} \text{hold slack} &= (t_{clk-q} + t_{L,min}) - t_h \\ &= (50\text{ps} + 20\text{ps}) - 60\text{ps} \\ &= 10\text{ps} \end{aligned}$$

$\text{Slack} = \underline{\hspace{2cm} 10 \hspace{2cm}} \text{ps.}$

SID : \_\_\_\_\_

(b) (6 pts) Now, with a physical clock network, it is found that  $t_{sk2} = 150\text{ps}$  and  $t_{sk3} = -100\text{ps}$ . Answer the following questions.

(i) What is the minimum clock period,  $T_{min}$ , that this circuit may operate at correctly?

min path with skew is  $R1 \rightarrow R2$

$$T_{clk} \geq t_{clk-q} + t_{l,max} + t_{su} - t_{sk}$$

$$\geq 50\text{ps} + 225\text{ps} + 75\text{ps} - (-100\text{ps} - 150\text{ps})$$

$$\geq 600\text{ps}$$

$$T_{min} = \underline{600} \text{ ps.}$$

(ii) What is the hold slack for this circuit (a negative hold slack value indicates a violation)?

min path with skew is  $R2 \rightarrow R1$

$$\text{hold slack} = (t_{clk-q} + t_{l,min}) - (t_h + t_{sk})$$

$$= (50\text{ps} + 90\text{ps}) - (60\text{ps} + (150\text{ps} - 100\text{ps}))$$

$$= 140\text{ps} - (310\text{ps})$$

$$= -170\text{ps}$$

$$\text{Slack} = \underline{-170} \text{ ps.}$$

(c) (6 pts) Now, we will consider the effect of variability in the clock network. The clock source, CLK1, has cycle-to-cycle jitter of 50ps. Furthermore, each of  $t_{sk2}$  and  $t_{sk3}$  have a variability of  $\pm 10\%$  in their delay, (the nominal delay is the same as in part (b)). Answer the following questions.

(i) What is the minimum clock period,  $T_{min}$ , that this circuit may operate at correctly?

• Same skew as part (b) → take max path and add jitter in worse direction

• Note jitter is on CLK1, so it will affect all clocks the same way

• Also, add skew variability in the worse direction on the max path

$$\text{skew variability for } R1 \rightarrow R2: t_{sk2} = 150 + 15\text{ps}$$

$$t_{sk3} = -100 - 10\text{ps}$$

$$T_{min} = T_{min,0} - t_{sk,var} + t_j$$

$$= 600\text{ps} + (10\text{ps} + 15\text{ps}) + 50\text{ps}$$

$$= 675\text{ps}$$

$$T_{min} = \underline{675} \text{ ps.}$$

(ii) What is the hold slack for this circuit (a negative hold slack value indicates a violation)?

• Same hold path as before

• no effect from cycle-to-cycle jitter because all clocks affected the same way

• add skew variation in worse direction

$R2 \rightarrow R1$

$$T_{min} = \underline{-195} \text{ ps.}$$

$$\text{hold slack} = \text{slack}_{1,0} - t_{sk,var}$$

$$= -170\text{ps} - (10\text{ps} + 15\text{ps})$$

$$= -195\text{ps}$$

SID : \_\_\_\_\_

(d) (EECS251A Only) (6 pts) You may now set the values of  $t_{sk2}$  and  $t_{sk3}$ . What would you set  $t_{sk2}$  and  $t_{sk3}$  to allow for the shortest clock period possible? What is that  $T_{min}$ ? You may ignore hold time violations (they can be fixed by adding buffers) and assume that  $t_{sk2}$  and  $t_{sk3}$  have no variability.

Want to equalize path delays as much as possible, so use skews to redistribute  
↳ write constraints and minimize  $T_{clk}$

check

Constraints:  $T_{clk} \geq$

$$0 \rightarrow 1: 625\text{ps} - t_{sk2}$$

$$437.5\text{ps}$$

$$1 \rightarrow 2: 350\text{ps} - t_{sk3} + t_{sk2}$$

$$437.5\text{ps}$$

$$2 \rightarrow 0: 225\text{ps} + t_{sk3}$$

$$325\text{ps}$$

$$2 \rightarrow 1: 525\text{ps} - t_{sk2} + t_{sk3}$$

$$437.5\text{ps} \quad \checkmark$$

$$T_{clk} \geq 437.5\text{ps}$$

$$t_{sk3} = 100\text{ps}$$

$$t_{sk2} = 187.5\text{ps}$$

$$t_{sk2} = \underline{\hspace{2cm} 187.5 \hspace{2cm}} \text{ps.}$$

$$t_{sk3} = \underline{\hspace{2cm} 100 \hspace{2cm}} \text{ps.}$$

$$T_{min} = \underline{\hspace{2cm} 437.5 \hspace{2cm}} \text{ps.}$$

SID : *solution*

## 6) Adders (20 points, 20 minutes)

a) Ripple Carry Adder: Consider the following 4-bit ripple carry adder. The gates are from standard cell library and have the following properties:



Figure 6.a.

Find the critical path of the circuit, and calculate the delay.

$$A_0 \rightarrow Cout : \underbrace{8 + 4 + 6}_{A/B \rightarrow Cout} + \underbrace{(4+6) \times 3}_{Cin \rightarrow Cout} = [48\text{ps}]$$

$$A_0 \rightarrow S_3 : \underbrace{8 + 4 + 6}_{A/B \rightarrow Cout} + \underbrace{(4+6) \times 2}_{Cin \rightarrow Cout} + \underbrace{8}_{Cin \rightarrow S} = 46\text{ps}$$

Critical Path: *A[0]/B[0]* *Cout*. (example:  $A[1] \rightarrow S[2]$ )

Delay: *48* ps.

SID : *solution*

b) Carry-Lookahead Adder: Consider the following 4-bit carry-lookahead adder. The gates have the same delays as in part a). Again, find the critical path and calculate the corresponding path delay.



Figure 6.b.

$$A_1/B_1 \rightarrow S_3 : 8 + \underbrace{4 + 6 + 6}_{FA} + \underbrace{4 + 6}_{PG+C} + \underbrace{8}_{FA} = 42 \text{ ps}$$

(or  $A_0/B_0 \rightarrow S_3$ )

if swapping  $p_0, p_1$ .  
Numbers are same)

$$\text{Any } A/B \text{ to } Cout : 8 + \underbrace{4 + 6}_{FA} + \underbrace{4 + 6 + 6}_{PG \text{ or } PG+C} + \underbrace{6}_{FA} = 34 \text{ ps.}$$

( $P, G$  don't rely on  $Cin$ )

$$A/B \rightarrow P \quad P_1 \rightarrow G \quad P_1 \rightarrow Cout .$$

Critical Path: *A[1]* → *S[3]*

Delay: *42* ps.

SID : \_\_\_\_\_

## 7) SRAM (18 points, 20 minutes)



Figure 7

- a) (3 pts) An SRAM cell in Figure 5 should be sized to be read stable and writable at nominal conditions. Size the SRAM cell by matching each transistor name with a choice of widths (W): A) 180nm, B) 135nm, C) 225nm. All transistors have equal length.

|                 |   |
|-----------------|---|
| W <sub>M1</sub> | C |
| W <sub>M2</sub> | B |
| W <sub>M5</sub> | A |

SID : \_\_\_\_\_

b) (8 pts) Please answer the following as true (T), false (F) or doesn't matter (X)

|                                                                                                                                      |                                                                                                                                |
|--------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|
| Increasing the supply voltage for the cross-coupled inverters (while keeping WL and BL at the nominal voltage) will hurt writability | T: Increasing the supply for the cross-coupled inverter increase the pull-up strength of PMOS<br>It makes M5/6 harder to write |
| Decreasing the width of M5 and M6 will help the read speed                                                                           | F: Decreasing the width of M5/6 reduce the strength and makes reading slower                                                   |
| Decreasing the width of M5 and M6 will help the read stability                                                                       | T: Decreasing the width makes M5 and M6 weak<br>So it's harder to override the existing value                                  |
| Increasing the WL voltage will help writability                                                                                      | T: Makes M5/6 stronger, so easier to write in                                                                                  |
| Less cells per bitline speeds up writing                                                                                             | X: It's not relevant, speed up precharging doesn't count                                                                       |

SID : \_\_\_\_\_

c) Now you are given a memory block that is 128x16 bits. Suppose you would like to use multiple instances of this block to design a memory that is 256x32. The diagram below is a possible implementation. Fill the address signal names in the blanks and use the little-endian convention for address/data signals.



|   |                 |
|---|-----------------|
| 1 | addr[7]         |
| 2 | addr[6:0]       |
| 3 | data_in[15:0]   |
| 4 | data_out[15:0]  |
| 5 | data_out[31:16] |
| 6 | 1'b1            |
| 7 | 1'b0            |

SID : \_\_\_\_\_

## 8) Pipelining, Energy (13 points, 15 minutes)

We would like to examine some properties of a **4-stage** RISC-V datapath. The datapath has the following properties:

- Total capacitance of all gates:  $10nF$
- Max Frequency @ 1.0V: 400 MHz
- Supply voltage: 1.0V
- Register (flip-flop-based) overhead (sum of setup and clock-to-output delay) is 500ps (at 1.0V)

The CPU runs a workload with the following properties:

- Activity factor  $\alpha$ : 0.1
- Number of instructions: 1000
- CPI (cycles per instruction) = 1

In this question, you may ignore leakage and short-circuit power.

a) (2 pts) What is the dynamic power consumption of this design?

$$P = \alpha C V_{DD}^2 f \\ = 0.1 \cdot 10nF \cdot 1^2 \cdot 400MHz$$

$$P = \underline{400} \text{ mW.}$$

b) (2 pts) How much energy is consumed by this workload?

$$E_{\text{per cycle}} = \alpha C V_{DD}^2 \cdot \\ = 0.1 \cdot 10nF \cdot 1^2$$

$$E_{\text{workload}} = E_{\text{per cycle}} \cdot \text{instructions} \cdot \text{CPI} \\ E = \underline{1} \muJ. \\ = E_{\text{per cycle}} \cdot 1000$$

SID : \_\_\_\_\_

c) (5 pts) If we increase the number of pipeline stages from 4 to 5, while perfectly rebalancing the logic, what would be the clock period of the design at the same 1-V supply?

$$\text{logic} = \left( \frac{1}{400\text{MHz}} - 500\text{ps} \right) \cdot 4 \quad \swarrow \text{4 stages}$$

$= 2\text{ns} \cdot 4$       ↑ logic delay without register  
 $= 8\text{ns}$                   for 1 stage

rebalanced = logic / 5 = 1.6 ns

new critical path = rebalanced + 500ps  
 $= 2.1\text{ ns}$

$$T_{\text{clk}} = \underline{2.1} \text{ ns.}$$

d) (4pts) We would like to reduce the power of this processor by lowering the supply voltage, so it operates at 400MHz. If the drain current is linearly proportional to supply voltage, what is the supply voltage that achieves this frequency?

$$\frac{1}{2.1\text{ ns}} = 476\text{ MHz} \quad \begin{matrix} \leftarrow \text{current} \\ \text{frequency} \end{matrix}$$

$$\frac{400}{476} = \frac{\text{new supply}}{\text{old supply}}$$

$$V_{\text{DD}} = \underline{0.84} \text{ V.} \quad = \quad 0.84$$