

**Homework 11: Timing, Clocks, and Power Delivery**

*Due: Friday Dec 5th, 5pm*

Please include your name, SID and specify either CS150, EE141 or EE241A at the top of your homework handin. Homeworks must be submitted electronically as a single file in PDF format.

**Problem 1: Timing and clock distribution**

Examine the pipeline shown below. The minimum and maximum delays through the logic are annotated on the figure, and the flip-flops have the following properties:  $t_{clk-q} = 50ps$ ,  $t_{setup} = 100ps$ ,  $t_{hold} = 100ps$ .



Figure 1: Pipeline for 1a

- Assume there is no skew between clocks, what is the minimum clock cycle time for this pipeline? Are there any hold time violations?
- Now we include the clock distribution network for this pipeline. The delay of each inverter is nominally 50ps, but each inverter's delay varies randomly by  $\pm 10\%$ . Now what is the minimum clock cycle time?



Figure 2: Pipeline for 1b

- c) Under the same condition in part (b), this pipeline has a hold time failure. You can only add delay onto the logic paths - you can not modify the clock network. Where do you want to add delay to eliminate the hold time failure? How much delay is necessary?

## Problem 2: Pipelining and Parallelism

- Draw a block diagram that implements the math expression  $y_i = a * y_{i-1} + b * x_i + c * x_{i-1}$ . You are allowed to use multipliers, adders, and flip-flops, and can assume that you get the signal  $x_i$  on every cycle, and also that registers are initialized to the correct values.
- Suggest where to add pipeline register(s) to increase the operating frequency of your design from the previous part
- Assume that you are given 2 input values per cycle, and now want to compute two results per cycle. Assume  $x_i$  and  $x_{i+1}$  are both available per cycle. Minimize the clock period for your implementation that computes both  $y_i$  and  $y_{i+1}$  each cycle.

## Problem 3: Power Delivery

In a made-up technology the metal layers have a resistance of  $3m\Omega/sq$ . for the metal that your power grid is routed in. The wires are current limited to  $10mA/\mu m$  for electromigration rules. In other words, a  $1\mu m$  wide wire can carry at most  $10mA$  of current. You have supply connections on one side of the processor, and need to distribute to the whole CPU, which measures  $2mm$  by  $2mm$ . A simplified model of this is shown below:



Figure 3: Diagram for 3

- a) For a 12W cpu at 1.2V  $V_{dd}$ , if we want 10% IR drop, what is the required resistance of the power straps?
- b) Given the size of the processor and having to route power from one side to the other, what is the total wire width necessary for ensuring that your power grid is less than the resistance calculated in part a?
- c) Given the total width from part b, what is the maximum current that your power grid can support according to electromigration rules?
- d) For a power grid to be compliant for both resistance and electromigration, what is the total width necessary?

### Problem 4: Clock variability

You have a grid of 1024 by 1024 flip-flops, all driven from the same clock. Each flip-flop is spaced  $1\mu m$  from its neighbors in the four cardinal directions (North, south, east and west). The grid is driven by one clock in the bottom middle, that is routed straight to the middle of the grid. Each branch in the H tree contains one buffer in each of the 4 directions that it is driving signals in, and that buffer has a  $\pm 5\%$  variance in its delay. The nominal delay of each buffer is 50ps.

- a) How many total stages of buffering do you have from the root to each leaf cell?
- b) What is the variance in delay between one cell in the upper right corner and a cell in the lower left corner?
- c) What is the variance in delay between two neighboring leaves in the tree? This means that they share everything back to the root node except for the last stage of buffering

### Problem 5 (EE241A Only): Aggressive clock distribution

Download and read the paper “Clock and Synchronization Networks for a 3GHz 64bit ARMv8 8-core SoC” (Ravezzi ESSCIRC 2014) from bCourses, then answer the following questions.

- a) What is the maximum RMS Jitter measured for this chip? Will this affect only setup times, only hold times, or both setup and hold times.
- b) Why is duty cycle important in this design?
- c) Why are the local clock gaters not placed before the local clock mesh?
- d) How many ps does it take for an edge generated by the PLL to arrive at a flip-flop?
- e) A new latch is proposed in Figure 7, which has a “swapped transistor stack.” Why is this swap not common practice in all latches?