

# Discussion 13

Alisha Menon

EECS 151/251 Fall 2021

# Administrivia

- Homework 11 due December 3<sup>rd</sup>
- Final review session TBD
- Check-off date for projects December 7<sup>th</sup>
- Final exam December 13<sup>th</sup>

# Agenda

- Final exam review
- Focus on recent material:
  - Flip-flop
  - SRAM
  - Cache

# Fall 2020 final problem 8

In this problem, you are asked to perform setup and hold timing analyses. Consider the circuit given in the diagram. Each flip-flop has a clock-to-q delay of  $t_{clk-q} = 80ps$ , setup time of  $t_{su} = 40ps$ , hold time of  $t_h = 60ps$ .

Note: you do **not** need to consider any specific instruction in this problem.



- a) Assume there is no skew and jitter between the clocks. What is the minimum clock period this circuit can operate with? Is there any hold time violation? Denote your hold time analysis in terms of hold slacks, where a negative slack would mean a violation.

$$\begin{aligned} T_{clk} &> t_{clk-q} + t_{max} + t_{su} \\ &= 80ps + 680ps + 40ps = 800ps \end{aligned}$$

$$\begin{aligned} \text{hold slack} &= t_{clk-q} + t_{min} - t_h \\ &= 80ps + 10ps - 60ps = 30ps \end{aligned}$$



- b) Now, if the circuit operates at  $T_{clk} = 820ps$ , and we have  $t_{skew1} = 20ps$ ,  $t_{skew2} = -10ps$ ,  $t_{skew3} = 10ps$ . Instead of being a certain value, the cycle-to-cycle  $t_{clk-q}$  of each flip-flop presents a random distribution between 70ps and 90ps. Assume there is no clock jitter. Denote your timing analysis in terms of setup and hold slacks, where a negative slack would mean a violation.

$$\begin{aligned}
 \text{Setup slack}_2 &= T_{clk} + t_{skew1,2} - (t_{clk-q,\max} + t_{\text{crit},\max} + t_{tsu}) \\
 &= 820ps - 30ps - (90ps + 650ps + 40ps) \\
 &= 10ps
 \end{aligned}$$

$$\begin{aligned}
 \text{Hold slack} &= t_{clk-q,\min} + t_{\text{crit},\min} - t_{skew2,3} - t_h \\
 &= 70ps + 10ps - 20ps - 60ps = 0ps
 \end{aligned}$$



- c) If you are free to set the value of  $t_{skew1}$  and  $t_{skew2}$ , what value will you use so that the circuit can operate at minimum clock period without any violation? What is the optimum hold time slack under this clock period? (i.e. You should achieve the minimum clock period first, then try to maximize the hold time slack without increasing the clock period) Assume no clock jitter and use  $t_{clk-q} = 80ps$  in this part. *assume  $t_{skew3} \approx 0$*

*change clk<sub>1</sub> to average paths 1 (680ps) and 2 (650ps),  $sh_{av} = 15ps$*

$$T_{clk} = t_{clk-q} + t_{max_{0 \rightarrow 1}} + t_{skew} = t_{skew_{1,2}}$$

$$= 80ps + 680ps + 40ps - 15ps = 785ps$$

*sh<sub>av</sub> can be set anywhere between 0-15ps without setup violation.  
however, 15ps skew adds to hold time slack, new hold slack = hold slack<sub>old</sub> + 15ps = 20ps + 15ps = 35ps*

# Fall 2020 final problem 9

- a) Given the 6T SRAM shown below, evaluate the following statements as true (T) or false (F):



Figure 11: 6T SRAM

- i) T This SRAM array can only support 1 read and 1 write port.
- ii) F SRAM cells with more than 6 transistors will always support arrays with more than 1 read and/or write ports. *could just be for stability*
- iii) F The bitline that stays high is the one primarily involved in flipping the cell state during a write operation. *NMOS passes a bad 1*

# Fall 2020 final problem 9

- a) Given the 6T SRAM shown below, evaluate the following statements as true (T) or false (F):



Figure 11: 6T SRAM

- iv) T In a FinFET implementation of a 6T SRAM, the ratio of  $(W/L)_2 : (W/L)_5 : (W/L)_1$  can be 1:2:3 for good read stability and writability.
- v) T In a 6T SRAM, circuit techniques that improve read stability inevitably hurt writability, and vice versa.
- vi) T SRAM cell leakage degrades read access time. *yes, have to wait longer to see a difference*

# Fall 2020 final problem 9

b) Consider an 256-word SRAM array where each word is 256 bits wide. The row decoding logic is placed to the left of the array, as shown in lecture. The array has the following properties:

- The 6T SRAM cell area is  $0.2\mu m \times 0.2\mu m$ .
- Access transistors have  $C_g = C_d = 20aF$ .
- The decoding scheme consists of 4-bit predecoders and final row decoders. The circuit model for each predecoder is shown below (Fig. 12).
- $C_W$  models the wire capacitance between the predecoder and final decoders.
- $C_{WL}$  models the total load on each final decoder.
- The wordline has capacitance per unit length of  $0.1fF/\mu m$ .
- In this technology,  $R_p = R_n$  for a unit inverter and  $\gamma = 1$ .

LE: 5/2



(i)  $2^4$  of these pre-decoders, I specify  $256/2^4$  final decoders  
= 16

(ii) wire + access gate =  $256 \cdot 2 \cdot 20aF + 256 \cdot 0.2\mu m \cdot 0.1fF/\mu m = 15.36fF$

(iii)  $F = 100, B = 8 + 16 = 24, G = 15/4$

$H = FBG = 9000$

$SE = \sqrt[4]{9000}$

Calculate:

- the total number of final decoders each predecoder drives (i.e. the factor M in Fig. 12)
- the total capacitance per wordline
- the stage effort (you may leave this expression in terms of a root)

# Fall 2020 final problem 10

- a) A direct-mapped cache is 8KB in size, with 64B blocks. Memory addresses are 32 bits. In a memory access, how many address bits are used for:

i) The byte-select offset? 6

$64\text{-byte} = 2^6 \text{ bytes}$ , 6 bytes offset

ii) The cache block index? 7

Cache size =  $8\text{ kB} = 2^{13} \text{ bytes}$ ,  $2^{13}\text{B} / 2^6\text{B/Blocks} = 2^7 \text{ blocks}$

iii) The cache tag? 19

$32 - 6 - 7 = 19 \text{ tag bits}$

# Fall 2020 final problem 10

For parts b–d, consider the following program, written in pseudocode, that loops twice over an array of 1-byte numbers (for clarity, RISC-V assembly is also provided at the end of the problem). Assume N is very large and divisible by 32, and that `arr` starts at a memory address divisible by 32.

```
byte arr[N];  
  
for (int j = 0; j < 2; j++) {  
    for (int i = 0; i < N; i++) {  
        process(arr[i]);  
    }  
}
```

- b) Suppose we have an LRU (evict least recently used), 32-byte block, fully associative cache of size N bytes.

i) In terms of N, how many memory accesses are cache hits? \_\_\_\_\_

$$\frac{3}{32}N + N = \frac{63}{32}N$$

ii) Misses? \_\_\_\_\_

$$\frac{1}{32}N$$



# Fall 2020 final problem 10

- c) Suppose we have an LRU (evict least recently used), 32-byte block, fully associative cache of size  $N / 2$  bytes.

i) In terms of  $N$ , how many memory accesses are cache hits?

$$\frac{31}{32} N \cdot 2 = \frac{31}{16} N$$

ii) Misses?  $\underline{\frac{1}{32} \cdot 2 \cdot N} = \frac{1}{16} N$

$N/2$  iteration 1: miss  $1/32 N/2$  } same as before =  $\frac{31}{32} N$  hits

$2N/2$  iteration 1: miss  $1/32 N/2$

$N/2$  iteration 2: miss  $1/32 N/2$  since first  $N/2$  was evicted =  $\frac{31}{32} N$  hits

$2N/2$  iteration 2: miss  $1/32 N/2$  since second  $N/2$  was evicted

# Fall 2020 final problem 10

- d) Suppose we take our LRU cache of size  $N / 2$ , and change its replacement policy to MRU, meaning that when we need to evict a cache block, we evict the most recently accessed block. For the given program, would this cache perform the same, better, or worse than its LRU counterpart? Why?

better, only evict last accessed block

for iteration 2, first half majority will still be in  
cache