

---

---

# **CSE477**

# **VLSI Digital Circuits**

# **Fall 2002**

## **Lecture 19: Timing Issues; Introduction to Datapath Design**

Mary Jane Irwin ( [www.cse.psu.edu/~mji](http://www.cse.psu.edu/~mji) )  
[www.cse.psu.edu/~cg477](http://www.cse.psu.edu/~cg477)

[Adapted from Rabaey's *Digital Integrated Circuits*, ©2002, J. Rabaey et al.]

# Review: Sequential Definitions

- ❑ Use two level sensitive latches of opposite type to build one master-slave flipflop that changes state on a clock edge (when the slave is transparent)
- ❑ Static storage
  - ❑ static uses a **bistable** element with feedback to store its state and thus preserves state as long as the power is on
    - Loading new data into the element: 1) cutting the feedback path (mux based); 2) overpowering the feedback path (SRAM based)
- ❑ Dynamic storage
  - ❑ dynamic stores state on parasitic capacitors so the state held for only a period of time (milliseconds); requires periodic refresh
  - ❑ dynamic is usually simpler (fewer transistors), higher speed, lower power but due to noise immunity issues always modify the circuit so that it is pseudostatic

# Timing Classifications

---

## ❑ Synchronous systems

- ❑ All memory elements in the system are simultaneously updated using a globally distributed periodic synchronization signal (i.e., a global clock signal)
- ❑ Functionality is ensured by strict constraints on the clock signal generation and distribution to minimize
  - Clock skew (spatial variations in clock edges)
  - Clock jitter (temporal variations in clock edges)

## ❑ Asynchronous systems

- ❑ Self-timed (controlled) systems
- ❑ No need for a globally distributed clock, but have asynchronous circuit overheads (handshaking logic, etc.)

## ❑ Hybrid systems

- ❑ Synchronization between different clock domains
- ❑ Interfacing between asynchronous and synchronous domains

# Review: Synchronous Timing Basics



- ❑ Under ideal conditions (i.e., when  $t_{clk1} = t_{clk2}$ )

$$T \geq t_{c-q} + t_{plogic} + t_{su}$$

$$t_{hold} \leq t_{cdlogic} + t_{cdreg}$$

- ❑ Under real conditions, the clock signal can have both spatial (clock skew) and temporal (clock jitter) variations
  - ❑ skew is constant from cycle to cycle (by definition); skew can be positive (clock and data flowing in the same direction) or negative (clock and data flowing in opposite directions)
  - ❑ jitter causes T to change on a cycle-by-cycle basis

# Sources of Clock Skew and Jitter in Clock Network



## □ Skew

- manufacturing device variations in clock drivers
- interconnect variations
- environmental variations (power supply and temperature)

## □ Jitter

- clock generation
- capacitive loading and coupling
- environmental variations (power supply and temperature)

# Positive Clock Skew

- Clock and data flow in the same direction



$$T : \quad T + \delta \geq t_{c-q} + t_{p\text{logic}} + t_{su} \quad \text{so} \quad T \geq t_{c-q} + t_{p\text{logic}} + t_{su} - \delta$$

$$t_{hold} : \quad t_{hold} + \delta \leq t_{cd\text{logic}} + t_{cd\text{reg}} \quad \text{so} \quad t_{hold} \leq t_{cd\text{logic}} + t_{cd\text{reg}} - \delta$$

- $\delta > 0$ : Improves performance, but makes  $t_{hold}$  harder to meet. If  $t_{hold}$  is not met (race conditions), the circuit malfunctions independent of the clock period!

# Negative Clock Skew

- Clock and data flow in opposite directions



$$T : T + \delta \geq t_{c-q} + t_{p\text{logic}} + t_{su} \text{ so } T \geq t_{c-q} + t_{p\text{logic}} + t_{su} - \delta$$

$$t_{\text{hold}} : t_{\text{hold}} + \delta \leq t_{cd\text{logic}} + t_{cd\text{reg}} \text{ so } t_{\text{hold}} \leq t_{cd\text{logic}} + t_{cd\text{reg}} - \delta$$

- $\delta < 0$ : Degrades performance, but  $t_{\text{hold}}$  is easier to meet (eliminating race conditions)

# Clock Jitter

- Jitter causes  $T$  to vary on a cycle-by-cycle basis



$$T : T - 2t_{\text{jitter}} \geq t_{\text{c-q}} + t_{\text{plogic}} + t_{\text{su}} \quad \text{so} \quad T \geq t_{\text{c-q}} + t_{\text{plogic}} + t_{\text{su}} + 2t_{\text{jitter}}$$

- Jitter directly reduces the performance of a sequential circuit

# Combined Impact of Skew and Jitter

- ❑ Constraints on the minimum clock period ( $\delta > 0$ )



$$T \geq t_{c-q} + t_{plogic} + t_{su} - \delta + 2t_{jitter}$$

$$t_{hold} \leq t_{cdlogic} + t_{cdreg} - \delta - 2t_{jitter}$$

- ❑  $\delta > 0$  with jitter: Degrades performance, and makes  $t_{hold}$  even *harder* to meet. (The acceptable skew is reduced by jitter.)

# Clock Distribution Networks

---

- ❑ Clock skew and jitter can ultimately limit the performance of a digital system, so designing a clock network that minimizes both is important
  - In many high-speed processors, a majority of the dynamic power is dissipated in the clock network.
  - To reduce dynamic power, the clock network must support clock gating (shutting down (disabling the clock) units)
- ❑ Clock distribution techniques
  - Balanced paths (H-tree network, matched RC trees)
    - In the ideal case, can eliminate skew
    - Could take multiple cycles for the clock signal to propagate to the leaves of the tree
  - Clock grids
    - Typically used in the final stage of the clock distribution network
    - Minimizes absolute delay (not relative delay)

# H-Tree Clock Network

- If the paths are perfectly balanced, clock skew is zero



Can insert clock gating at multiple levels in clock tree  
Can shut off entire subtree if all gating conditions are satisfied



# **DEC Alpha 21164 (EV5)**

---

- ❑ 300 MHz clock (9.3 million transistors on a 16.5x18.1 mm die in 0.5 micron CMOS technology)
  - ❑ single phase clock
- ❑ 3.75 nF total clock load
  - ❑ Extensive use of dynamic logic
- ❑ 20 W (out of 50) in clock distribution network
- ❑ Two level clock distribution
  - ❑ Single 6 stage driver at the center of the chip
  - ❑ Secondary buffers drive the left and right sides of the clock grid in m3 and m4
- ❑ Total equivalent driver size of 58 cm !!



Clock Drivers

# Clock Skew in Alpha Processor

- ❑ Absolute skew smaller than 90 ps



- ❑ The critical instruction and execution units all see the clock within 65 ps

# Dealing with Clock Skew and Jitter

---

- ❑ To minimize skew, balance clock paths using *H-tree* or *matched-tree* clock distribution structures.
- ❑ If possible, route data and clock in opposite directions; eliminates races at the cost of performance.
- ❑ The use of gated clocks to help with dynamic power consumption make jitter worse.
- ❑ Shield clock wires (route power lines –  $V_{DD}$  or GND – next to clock lines) to minimize/eliminate coupling with neighboring signal nets.
- ❑ Use dummy fills to reduce skew by reducing variations in interconnect capacitances due to interlayer dielectric thickness variations.
- ❑ Beware of temperature and supply rail variations and their effects on skew and jitter. *Power supply noise fundamentally limits the performance of clock networks.*

# Major Components of a Computer



- ❑ Modern processor architecture styles (CSE 431)
  - Pipelined, single issue (e.g., ARM)
  - Pipelined, hardware controlled multiple issue – superscalar
  - Pipelined, software controlled multiple issue – VLIW
  - Pipelined, multiple issue from multiple process threads - multithreaded

# Basic Building Blocks

- Datapath

- Execution units
  - Adder, multiplier, divider, shifter, etc.
- Register file and pipeline registers
- Multiplexers, decoders

- Control

- Finite state machines (PLA, ROM, random logic)

- Interconnect

- Switches, arbiters, buses

- Memory

- Caches, TLBs, DRAM, buffers

# MIPS 5-Stage Pipelined (Single Issue)

## Datapath



# Datapath Bit-Sliced Organization



Tile identical bit-slice elements

# Next Lecture and Reminders

---

## ❑ Next lecture

- ❑ Adder design
  - Reading assignment – Rabaey, et al, 11.3

## ❑ Reminders

- ❑ Pick up second half of the new edition of the book from Sue in 202 Pond Lab
- ❑ Project final reports due December 5<sup>th</sup>
- ❑ HW4 due today
- ❑ HW5 due November 19<sup>th</sup>
- ❑ Final grading negotiations/correction (except for the final exam) must be concluded by December 10<sup>th</sup>
- ❑ Final exam scheduled
  - Monday, December 16<sup>th</sup> from 10:10 to noon in 118 and 121 Thomas