



# CSCE337 – Digital Design II

Lectures 19-21: Timing & Clocking Issues in Digital Systems

# Agenda

2

- Some Background Info
  - Clocking
  - Setup & Hold Times
  - Clock Signal
- Timing & Logic Synthesis
- Static Timing Analysis
- Fixing Timing Violations

# CMOS Cell Model

3



# CMOS Delay Model

4



- $C_{\text{wire}} = C_{s1} + C_{s2} + C_{s3} + C_{s4}$
- $\text{tpd} = R \times C_{\text{load}}$
- $R$  is either  $R_{dl}$  or  $R_{dh}$
- $C_{\text{load}} = C_{\text{wire}} + C_{in2} + C_{in3} + C_{in4}$



# CMOS Delay Model

5



# Rise and Fall Times

6



# Slew (Transition Time)

7



20-80% and 30-70% are in use (10-90% range was used for early technologies)

# Wire Model

8

## □ Wire Resistance

- $\text{Resistance} = L \times \text{Resistivity} / (W \times H)$
- $R_s = (\text{Resistivity}/H)$  (sheet resistance)
- For any given process, every metal layer  $R_s$  is given as  $\Omega/\text{square}$ 
  - The height (thickness) is fixed for a given metal layer
- For TSMC180nm Process
  - M1:  $0.08 \Omega/\text{square}$ 
    - $0.5\mu\text{m} \times 1\text{mm}$  wire  $\rightarrow 160\Omega$
  - M4:  $0.03 \Omega/\text{square}$ 
    - $0.5\mu\text{m} \times 1\text{mm}$  wire  $\rightarrow 60\Omega$



| Metal    | Resistivity ( $\rho$ )              |
|----------|-------------------------------------|
| Aluminum | $2.8 \times 10^{-8} \Omega\text{m}$ |
| Copper   | $1.7 \times 10^{-8} \Omega\text{m}$ |
| Silver   | $1.6 \times 10^{-8} \Omega\text{m}$ |

Copper is used for processes less than 180nm

# VIA Resistance

9

- Via resistance significant
- TSMC 0.18 $\mu$ m 6-Al

|         |               |
|---------|---------------|
| Diff-M1 | 11.0 $\Omega$ |
| Poly-M1 | 10.4 $\Omega$ |
| M2-M1   | 4.5 $\Omega$  |
| M3-M1   | 9.5 $\Omega$  |
| M4-M1   | 15.0 $\Omega$ |
| M5-M1   | 19.6 $\Omega$ |
| M6-M1   | 21.8 $\Omega$ |

Vias made from Tungsten  
in Aluminum processes.  
Vias are Copper in Copper  
processes



# Wire Model

10

## □ Wire Capacitance



$$\square C_{\text{total}} = C_{\text{top}} + C_{\text{bot}} + 2C_{\text{adj}}$$

**Every process defines  $C/\mu$  ( $0.2\text{fF}/\mu\text{m}$  for M2 in 180nm)**



# Capacitance Trends

11

- Parallel plate equation:  $C = \epsilon_{ox}A/d$ 
  - Wires are not parallel plates, but obey trends
  - Increasing area ( $W, t$ ) increases capacitance
  - Increasing distance ( $s, h$ ) decreases capacitance
- Dielectric constant
  - $\epsilon_{ox} = k\epsilon_0$ 
    - $\epsilon_0 = 8.85 \times 10^{-14} \text{ F/cm}$
    - $k = 3.9$  for  $\text{SiO}_2$
- Processes are starting to use low-k dielectrics
  - $k \approx 3$  (or less) as dielectrics use air pockets

# 65nm Wire Attributes

12

| Layer | W<br>( $\mu m$ ) | T<br>( $\mu m$ ) | H<br>( $\mu m$ ) | Pitch<br>( $\mu m$ ) | r<br>( $\Omega$ ) | $c_{pp}$<br>( $fF/\mu m^2 \times 10^{-3}$ ) | $c_{fringe}$<br>( $fF/\mu m \times 10^{-3}$ ) |
|-------|------------------|------------------|------------------|----------------------|-------------------|---------------------------------------------|-----------------------------------------------|
| M1    | 0.09             | 0.18             | 0.59             | 0.20                 | 0.16              | 0.171                                       | 0.089                                         |
| M2    | 0.10             | 0.22             | 0.95             | 0.20                 | 0.14              | 0.232                                       | 0.078                                         |
| M3    | 0.10             | 0.22             | 1.34             | 0.20                 | 0.14              | 0.232                                       | 0.078                                         |
| M4    | 0.10             | 0.22             | 1.74             | 0.20                 | 0.14              | 0.232                                       | 0.078                                         |
| M5    | 0.10             | 0.22             | 2.13             | 0.20                 | 0.14              | 0.232                                       | 0.078                                         |
| M6    | 0.10             | 0.22             | 2.53             | 0.20                 | 0.14              | 0.232                                       | 0.078                                         |
| M7    | 0.10             | 0.22             | 2.92             | 0.20                 | 0.14              | 0.232                                       | 0.081                                         |
| M8    | 0.40             | 0.90             | 3.74             | 0.80                 | 0.02              | 0.063                                       | 0.092                                         |
| M9    | 0.40             | 0.90             | 5.23             | 0.80                 | 0.02              | 0.063                                       | 0.096                                         |

# 65nm Tech LEF

13

LAYER M4

```
TYPE ROUTING ;
DIRECTION VERTICAL ;
PITCH 0.200 ;
HEIGHT 1.7350 ;
THICKNESS 0.2200 ;
WIDTH 0.1 ;
MAXWIDTH 12.0 ;
...
...
RESISTANCE RPERSQ 0.1399000000 ;
CAPACITANCE CPERSQDIST 0.0002320000 ;
EDGECAPACITANCE 0.0000778000 ;
```

END M4

# 65nm Example (L=12mm)

14

$$R_{PERSQ} = 0.1399\Omega$$

$$CPERSQDIST = 0.000232 pF/\mu m^2$$

$$EDGE CAPACITANCE = 0.0000778 pF/\mu m$$

$$R = R_{PERSQ} \times \frac{L}{W} = 0.1399\Omega \times \frac{12mm}{0.1\mu m} = 17K\Omega$$

$$\begin{aligned} C &= CPERSQDIST \times L \times W + EDGE CAPACITANCE \times L \\ &= 0.000232 \frac{pF}{\mu m^2} \times 12mm \times 0.1\mu m + 0.0000778 \frac{pF}{\mu m} \times 12mm \\ &= 0.278pF + 0.934pF = 1.2pF \quad \textbf{80% edge capacitance} \end{aligned}$$

# Why Bother?

15

## □ Technology Trends



80's – early 90's



250nm



Now

# Wire Delay Model

16

- Wire has distributed R and C per unit length
  - wire delay increases quadratically with length
  - edge rate also degrades quadratically with length
- Simple lumped  $\pi$  (pi) model gives reasonable approximation
  - Actually, the 3-segment  $\pi$ -model accurate to 3% in simulation
  - $R_w$  is lumped resistance of wire
  - $C_w$  is lumped capacitance (put half at each end)



$$\text{Delay} = R_{\text{driver}} \times \frac{C_w}{2} + (R_{\text{driver}} + R_w) \times \left( \frac{C_w}{2} + C_{\text{load}} \right)$$

# Wire Delay Model

17

## □ Example:

- In  $0.18\mu\text{m}$  TSMC, 5x minimum inverter with effective resistance of  $3\text{ k}\Omega$ , driving FO4 load ( $25\text{fF}$ )

- $\text{Delay} = \text{R}_{\text{driver}} \times \text{C}_{\text{load}} = 75\text{ ps}$

- Now add 1mm M1 wire,  $0.25\mu\text{m}$  wide, and a via

- $R_w = 320\Omega \text{ wire} + 22\Omega \text{ via} = 344\Omega$

- $C_w = 160\text{fF}$

$$\begin{aligned}\text{Delay} &= \text{R}_{\text{driver}} \times \frac{C_w}{2} + (\text{R}_{\text{driver}} + R_w) \times \left( \frac{C_w}{2} + C_{\text{load}} \right) \\ &= 3\text{k}\Omega \times 80\text{fF} + (3\text{k}\Omega + 344\Omega) \times (80\text{fF} + 25\text{fF}) \\ &= 591\text{ps}\end{aligned}$$

# Elmore Delay

18

- A good approximation for the RC delay of complex RC circuits

- $t_{pd} = \sum R_{i-to-src} * C_i$

- Example



- $t_{pd} = R_1 C_1 + (R_1 + R_2) C_2 + (R_1 + R_2 + R_3) C_3 + \dots$

# RC Tree

19



# Elmore Delay for RC Tree

20



$$t_{d3} = R_1 C_1 + (R_1 + R_2) C_2 + (R_1 + R_2 + R_3) C_3 + R_1 C_4$$

$$t_{d4} = R_1 C_1 + R_1 C_2 + R_1 C_3 + (R_1 + R_4) C_4$$

# Wire Delay

21

- How to deal with long wires connecting gates?



# Wire Delay

22

- How to deal with long wires connecting gates?
  - Buffers insertion.



# Liberty (.lib) Format –Revisited

23

- The .lib file is an ASCII representation of the standard cells electrical models (timing, power, ...)
- The timing and power parameters are obtained by simulating the cells under a variety of conditions (characterization) and the data is represented in the .lib format

# Timing Models for Combinational Cells

24

## □ Pin Capacitance

- Input and output cells can specify capacitance at the pin.  
Generally capacitance is specified only for the cell inputs and not for the outputs.
- There are also specified separate values for **rise\_capacitance** and **fall\_capacitance**, which refer to the values used for rising and falling transitions at the pin INP1

```
pin (INP1) {  
    capacitance : 0.5;  
    rise_capacitance : 0.5;  
    rise_capacitance_range : (0.48, 0.52);  
    fall_capacitance : 0.45;  
    fall_capacitance_range : (0.435, 0.46);  
    . . .  
}
```



# Timing Models for Combinational Cells

25

## □ Unate

- The timing arc is negative unate when the corresponding output pin transition direction is opposite of the input pin transition direction
- The timing arc is positive unate when the output transition is in the same direction as the input transition
- Ex:
  - For 2 input AND cell, Both the timing arcs for AND are **positive\_unate**, therefore an input rise corresponds to an output rise.

# Liberty Timing Model (NLDM)

26

- The Non-Linear Delay (or NLDM) model is presented in a two-dimensional form, with the two independent variable:
  - The input transition time
  - The output load capacitance
- In the example, the delays of the output pin **OUT** are described. This portion of the cell description contains the rising and falling delay models for the timing arc from pin **INP1** to pin **OUT**

```
pin (OUT) {  
    max_transition : 1.0;  
    timing () {  
        related_pin : "INP1";  
        timing_sense : negative_unate;  
        cell_rise (delay_template_3x3) {  
            index_1 ("0.1, 0.3, 0.7"); /* Input transition */  
            index_2 ("0.16, 0.35, 1.43"); /* Output capacitance */  
            values ( /* 0.16, 0.35, 1.43 */ \/  
                /* 0.1 */ "0.0513, 0.1537, 0.5280", \  
                /* 0.3 */ "0.1018, 0.2327, 0.6476", \  
                /* 0.7 */ "0.1334, 0.2973, 0.7252", ) ;  
        }  
        cell_fall (delay_template_3x3) {  
            index_1 ("0.1, 0.3, 0.7"); /* Input transition */  
            index_2 ("0.16, 0.35, 1.43"); /* Output capacitance */  
            values ( /* 0.16, 0.35, 1.43 */ \/  
                /* 0.1 */ "0.0617, 0.1537, 0.5280", \  
                /* 0.3 */ "0.0918, 0.2027, 0.5676", \  
                /* 0.7 */ "0.1034, 0.2273, 0.6452" ) ;  
        }  
    }  
}
```

# Liberty Timing Model (NLDM)

27

- Cell delays are calculated from a Non Linear Delay Model (NLDM) table in the technology library
- Tables are indexed by input transition and total output load for each gate

Cell Delay =  $f$  (Input Transition Time, Output Load)



|                  |     | Output Load (pF) |     |     |     |
|------------------|-----|------------------|-----|-----|-----|
|                  |     | .005             | .05 | .10 | .15 |
| Input Trans (ns) | 0.0 | .1               | .15 | .2  | .25 |
|                  | 0.5 | .15              | .23 | .3  | .38 |
|                  | 1.0 | .25              | .4  | .55 | .75 |

Cell Delay (ns)



# Liberty Timing Model (NLDM)

28

- There is another NLDM table in the library to calculate output transition (Slew Rate)
- Output transition of a cell becomes the input transition of the next cell down the chain

**Output Transition = f (Input Transition Time, Output Load)**



| SPICE            |      | Output Load (pF) |      |      |      |
|------------------|------|------------------|------|------|------|
| Input Trans (ns) | .005 | .05              | .10  | .15  |      |
|                  | 0.00 | 0.10             | 0.20 | 0.37 | 0.60 |
|                  | 0.50 | 0.18             | 0.30 | 0.49 | 0.80 |
| 1.00             | 0.25 | 0.40             | 0.62 | 1.00 |      |

Output Transition (ns)

# Timing Models for Sequential Cells

29

- Consider the timing arcs of a sequential cell shown below
- For synchronous inputs (D), there are the following timing arcs:
  - Setup check arc (rising and falling)
  - Hold check arc (rising and falling)
- For synchronous outputs of flip-flop, such as pin Q or QN, there is the following timing arc:
  - CK-to-output propagation delay arcs (rising falling)



# Flip-Flop Timing Constraints

30



- $t_s$ : Setup Time
  - The minimum amount of time the data signal should be held steady before the clock edge
- $t_h$ : Hold Time
  - the minimum amount of time the data signal should be held steady after the clock edge

# Synchronous (Setup and Hold) Checks

31

- The setup and hold constraints for a synchronous pin of a sequential cell are normally described in terms of two-dimensional tables as illustrated below.

```
pin (D) {  
    direction : input;  
    . . .  
    timing () {  
        related_pin : "CK";  
        timing_type : "setup_rising":  
        rise_constraint (setup_template_3x3) {  
            index_1 ("0.4, 0.57, 0.84");/*Data transition*/  
            index_2 ("0.4, 0.57, 0.84");/*Clock transition*/  
            values ( /* 0.4, 0.57, 0.84 */ \  
                /* 0.4 */ "0.063, 0.093, 0.112", \  
                /* 0.57 */ "0.526, 0.644, 0.824", \  
                /* 0.84 */ "0.720, 0.839, 0.930" ) ;  
        }  
    }  
}
```

```
fall_constraint (setuphold_template_3x3) {  
    index_1 ("0.4, 0.57, 0.84");/*Data transition*/  
    timing () {  
        related_pin : "CK";  
        timing_type : "hold_rising";  
        rise_constraint (setuphold_template_3x3) {  
            index_1 ("0.4, 0.57, 0.84");/*Data transition*/  
            index_2 ("0.4, 0.57, 0.84");/*Clock transition*/  
            values ( /* 0.4, 0.57, 0.84 */ \  
                /* 0.4 */ "-0.220, -0.339, -0.584", \  
                /* 0.57 */ "-0.247, -0.381, -0.729", \  
                /* 0.84 */ "-0.398, -0.516, -0.864" ) ;  
        }  
    }  
}
```

# Path Delay

32

- What is the propagation delay?



- When all inputs change, the output starts to change or *glitch* after the minimum delay of 25ns, and settles to the final value after the propagation delay (65ns).
  - $t_{cd}$ : Contamination delay (important for hold constraint)
  - $t_{pd}$ : Propagation delay (important for setup constraint)

# Clocking

33

- Clocks provide the means to synchronize
- By allowing events to happen at known timing boundaries, we can sequence these events
- Greatly simplifies building of state machines
- No need to worry about variable delay through combinational logic (CL)
- All signals delayed until clock edge (clock imposes the worst case delay)



# Setup and Hold Constraints

34



- $t_{\text{cycle}} \geq t_{\text{dCQ}} + t_{\text{pd}} + t_s$
- $t_{\text{slack}} = t_{\text{cycle}} - (t_{\text{dCQ}} + t_{\text{pd}} + t_s)$

# Setup and Hold Constraints

35



$$\square t_{cCQ} + t_{cd} \geq t_{hold}$$

# Violating Setup & Hold Time Constraints

36



- Violation implies metastability

# Asynchronous Signals & Metastability

37

- Many synchronous systems need to interface to asynchronous input signals
  - ▣ Consider a computer system running at 1GHz with interrupts from I/O devices, keystrokes, etc.
  - ▣ Data transfers from devices with their own clocks. For example, Ethernet has its own 100MHz clock
- The probability that a flip-flop stays in the metastable state decreases exponentially with time.

# Asynchronous Signals & Metastability

38

- Therefore, any scheme that delays using the signal can be used to decrease the probability of failure.



# Clock Signal non-ideal Behavior

39



# Clock Signal non-ideal Behavior

40

- Skew
  - Differences in clock signal arrival times across the chip
  - Cause: Interconnect delay (unbalanced Clock Tree)
- Jitter
  - Temporal variations in clock edges at a given point on the chip.
    - Results in continuous clock period variation.
    - In general, each clock period has a different jitter.
    - $t_{jitter}$  is the maximum absolute jitter.
  - Cause: EM Interference, Cross-Talk, ....

# Clock Signal non-ideal Behavior

41



# Clock Tree and Skew

42



# Positive Skew & Negative Skew

43

## □ +ve Skew



## □ -ve Skew



# Static Timing Analysis

44

- **Problem:** Given a circuit, find the path(s) with the largest delay (critical paths)
- **Solution:** run SPICE and report the results of the simulation
  - Problem: SPICE is computationally expensive to run except for small-size circuits plus it needs test vector

# Timing Verification

45



# Static Timing Analysis

46

- **Problem:** Given a circuit, find the path(s) with the largest delay (critical paths)
  - Solution: run SPICE and report the results of the simulation
    - Problem: SPICE is computationally expensive to run except for small-size circuits plus it needs test vector
- **WANTED:** We need a fast method that produces relatively accurate timing results compared to SPICE
  - Solution: Static Timing Analysis (STA)
    - Typically takes a fraction of the time it takes to run

# STA

47

- **Static Timing Analysis** is a method for determining if a circuit meets timing constraints without having to simulate
  - Much faster than timing-driven, gate-level/Spice simulation (Dynamic Analysis)
  - Proper circuit functionality is not checked
  - Vector generation NOT required

# STA Steps

48

- Circuit is broken down into sets of timing paths
- Delay of each path is calculated
- Path delays are checked to see if timing constraints have been met



# Timing Path

49

- A Timing Path is a point-to-point path in a design which can propagate data from one flip-flop to another
- Each path has a start point and an endpoint
  - Start point: Input ports Clock pins of flip-flops
  - Endpoints: Output ports Data input pins of flip-flops



# Types of Timing Paths

50

- Depending on the logic through which data propagates a design can be considered to comprise of 4 paths
  - Input → reg : Data from Input port to the first flip-flop
  - Reg → reg : Data from one flip-flop to another ff
  - Reg → out : Data from last flip-flop to output port
  - Input → output : Data from Input – Output with no sequential components in between

# Types of Timing Paths

51



# Types of Timing Paths

52



# STA Terminologies

53

- **Critical Path:** Theoretically path which has maximum delay
- **Arrival Time:** Time taken by data to reach an end point from a specific start point.
- **Required Time:** Time at which data is required at a particular end point.
  - Depends on the requirements/specifications
- **Slack:** Difference in required time and arrival time.
  - For a design to work the slack value should always be positive

# Required Time

54

- **Required time** specifies the time point (interval) at which data is required to arrive at end point (data is required to be stable after arrival).
  - Time point after which data can become unstable (change) is called earliest required time
  - Time point after which data cannot become unstable (change) is called latest required time
- The requirement is set by timing constraints like setup/hold, etc.



# Arrival Time

55

- **Arrival time** defines the time interval during which a data signal will arrive at a path endpoint (after arrival time signal will be stable).
- Data is normally triggered by clock edge
- Data arrival depend on circuit delay, which vary (depend on temperature, supply voltage, etc.)
- Minimum delay, early arrival
- Maximum delay, late arrival



# Slack & Critical Path

56

- The slack is the difference between the required and arrival times
  - Negative Slack → Constraints Violation
  - Positive Slack → Constraints are met
- The path with the smallest slack is called the critical path.



# Early and Latest Analysis

57

- STA tool calculates the slack of each logic path, in order to find critical path.
- Early and Latest analysis approaches:
  - Assumes circuits have minimum delay, compares arrival time to earliest required time (hold check)
  - Assumes circuits have maximum delay, compares arrival time to latest required time (setup check)



# Solution Methodology

58

- 1) Find the total number of paths in the design
- 2) Identify the paths that are constrained by clock
- 3) Identify the arrival times of all the paths identified in step 2 above
- 4) Identify the required time equations of all paths identified in step 2
- 5) Equate the arrival times and required times of all relevant paths to obtain the time periods
- 6) Select the maximum applicable clock frequency from the clocks identified in step 5

# Problem 1

59

- Find the maximum frequency?



- Assume no input delays or output delays are specified

# Solution (1)

60

## □ Input – Reg Path

- Arrival Time = 3 (assuming no input delay constraint)
- Required Time = CP + 0.5 + 0.6 -  $t_{\text{setup}}$  = CP + 1.1 - 0.65
- Equating the TWO, CP = 3 - 1.1 + 0.65 = 2.55

## □ Reg – Reg Path

- Arrival Time = 0.5 + 0.6 + 1.4 + 4 = 6.5
- Required Time = 0.5 + 0.7 + 0.8 + CP - 0.75 = CP + 1.25
- Equating the two, CP = 6.5 - 1.25 = 5.25

## □ Reg – Output Path

- Path is not constrained (output delay constraint)

CP: Clock Period

# Solution (2)

61

- Two time periods available from the calculations 5.25, 2.55. Though first path can work if clock time period is 5.25 the second path can not work if clock time period is 2.55
- Hence the maximum clock frequency at which the circuit can work is  $1/5.25$  GHz

# More Problems

62



# Possible Paths

63



# False Paths

64

- Paths which physically exist in a design but are not logic paths.  
These paths never get sensitized under any input condition



# Multi-cycle Paths

65

- There are data paths that require more than one clock period for execution.



# How is STA Performed in S/W?

66

## □ Timing Graph

- Data paths with timing constraints
  - Starting from primary inputs/FF outputs
  - Ending at primary outputs/FF inputs
- Represented by a labeled directed graph (DAG)  $G = \langle V, E \rangle$ 
  - Timing node  $V \sim$  pin/primary input/output
  - Timing edge  $E \sim$  gate/wire delay



# How is STA Performed?

67

## □ Timing Analysis Terminology

- Actual Arrival Time (AAT): **Forward propagation**
- Required Arrival Time (RAT): **Backward propagation**
- Slack = RAT - AAT
  - A measure of how much timing margin exists at each node
  - Slack<0 → timing violation
  - Can optimize a particular branch
  - Can trade slack for power, area, robustness



$$A(v) = \max_{u \in FI(v)} (A(u) + d_{u \rightarrow v})$$



$$R(v) = \min_{u \in FO(v)} (R(u) - d_{v \rightarrow u})$$

# How STA is Performed?

68



Ignoring wire delays

# How STA is Performed?

69



RAT @ Sink is 12



Critical path nodes have the same slack

# Issues?

70

- How about Hold Constraints?
  - What we calculated was the late AAT and RAT which is good for setup constraints check
  - For Hold constraints check, we use the Early AAT and Early RAT
- Ordering?
  - Topological order for AAT
  - Reverse Topological order for RAT

# Late and Early AAT and RAT

71



$$AT_E(n) = \min \text{ delay to } n = \begin{cases} 0 & \text{if } n == \text{src} \\ \min_{p=pred(n)} \{ AT_E(p) + \delta(p,n) \} & \end{cases}$$

$$AT_L(n) = \max \text{ delay to } n = \begin{cases} 0 & \text{if } n == \text{src} \\ \max_{p=pred(n)} \{ AT_L(p) + \Delta(p,n) \} & \end{cases}$$



$$RAT_E(n) = \begin{cases} 0 & \text{if } n == \text{sink} \\ \max_{s=succ(n)} \{ RAT_E(s) - \delta(n,s) \} & \end{cases}$$

$$RAT_L(n) = \begin{cases} \text{Cycle time if } n == \text{sink} \\ \min_{s=succ(n)} \{ RAT_L(s) - \Delta(n,s) \} & \end{cases}$$

Note reversal of min and max for early and late modes; this is because we're subtracting delays instead of adding them

Source: R. Rutenbar

# Topological Sort

72

- A topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge  $uv$  from vertex  $u$  to vertex  $v$ ,  $u$  comes before  $v$  in the ordering.



5, 7, 3, 11, 8, 2, 9, 10  
3, 5, 7, 8, 11, 2, 9, 10  
5, 7, 3, 8, 11, 10, 9, 2

•  
•  
•

Src: Wikipedia

# Topological Sort (DFS)

73



```
topsort( node n ) {  
    for each s in succ(n) {  
        if s has not been visited {  
            topsort( s );  
            push n on stack ;  
            mark n as visited;  
        }  
    }  
}  
  
topsort(SRC);
```



Source: R. Rutenbar

# STA Tools

74



# Synopsys Prime Time (PT)

75

- Industry standard STA tool
- Sample Timing Report

| report_timing                    |                                                                  |        |
|----------------------------------|------------------------------------------------------------------|--------|
| Header                           |                                                                  |        |
|                                  | Startpoint: FF1 (rising edge-triggered flip-flop clocked by Clk) |        |
|                                  | Endpoint: FF2 (rising edge-triggered flip-flop clocked by Clk)   |        |
|                                  | Path Group: Clk                                                  |        |
|                                  | Path Type: max                                                   |        |
| Point                            | Incr                                                             | Path   |
| clock Clk (rise edge)            | 0.00                                                             | 0.00   |
| clock network delay (propagated) | 1.10 *                                                           | 1.10   |
| FF1/CLK (fdef1a15)               | 0.00                                                             | 1.10 r |
| FF1/Q (fdef1a15)                 | 0.50 *                                                           | 1.60 r |
| U2/Y (buf1a27)                   | 0.11 *                                                           | 1.71 r |
| U3/Y (buf1a27)                   | 0.11 *                                                           | 1.82 r |
| FF2/D (fdef1a15)                 | 0.05 *                                                           | 1.87 r |
| data arrival time                |                                                                  | 1.87   |
| clock Clk (rise edge)            | 4.00                                                             | 4.00   |
| clock network delay (propagated) | 1.00 *                                                           | 5.00   |
| FF2/CLK (fdef1a15)               |                                                                  | 5.00 r |
| library setup time               | -0.21 *                                                          | 4.79   |
| data required time               |                                                                  | 4.79   |
| data required time               |                                                                  | 4.79   |
| data arrival time                |                                                                  | -1.87  |
| slack (MET)                      |                                                                  | 2.92   |

Header  
Startpoint: FF1 (rising edge-triggered flip-flop clocked by Clk)  
Endpoint: FF2 (rising edge-triggered flip-flop clocked by Clk)  
Path Group: Clk  
Path Type: max

Capture clock  
Report is for setup



# Fixing Timing Violations

76

- Re-synthesis (Before Physical Synthesis)
  - Local synthesis transforms
- Timing-driven placement (During Physical Synthesis)
  - Critical net weighting
- Timing-driven routing (During Physical Synthesis)
  - Net ordering
  - Buffering
  - Topology optimization
- Post-route optimization (IPO)
  - Re-routing
  - Re-timing and useful clock skew
  - Sizing
  - Buffering

# Local Synthesis Transforms

77

.....  
Double Inverter  
Removal  
.....



# Local Synthesis Transforms

78

## □ Resizing



—♦— A —■— B —▲— C

# Local Synthesis Transforms

79

## □ Cloning



# Local Synthesis Transforms

80

## □ Buffering



# Local Synthesis Transforms

81

## □ Redesign Fan-in Tree



# Local Synthesis Transforms

82

## □ Redesign Fan-out Tree



Longest Path = 5

Longest Path = 4  
Slowdown of buffer due to load

# Local Synthesis Transforms

83

## Swap Pins



Simple sorting on arrival times and delay works



# Post Layout Optimizations

84

- Re-route to reduce cross-talk effect



# Post Layout Optimizations

85

## □ Useful Skew

