

15-10-2024

## HDLSS

Kedar Kulkarni

always @ .

```

begin
  if (input)
    begin
      a = 1'b1;
    end
  else
    begin
      a = 1'b0;
    end
end
  
```



| sel | Y (out) |
|-----|---------|
| 0   | a       |
| 1   | b       |

### HDL Design flow

→ Design description

i<sub>p</sub> - MRD

o<sub>p</sub> - PDD, HLA, AMA (μA)

→ Verification

→ Lint  $\xrightarrow{\text{latch}}$   
 $\xrightarrow{\text{overflow, wraparound}}$   
 $\xrightarrow{\text{CDC}}$

→ Synthesis  $\xrightarrow{\text{(reused gates)}}$   
 $\xrightarrow{\text{mapping (based on FPGABlock)}}$

```

always @ (posedge clk)
begin
  if (cread_a == 1'b1)
    begin
      already_reg <= 1'b0;
    end
  else
    begin
      if (cbus_clk_en == 1'b1)
        begin
          already_reg <= already;
        end
      end
    end
  end
end
  
```

```

always @ (posedge clk)
begin
  if (cread_a == 1'b1)
    begin
      already_reg <= 1'b0;
    end
  else
    begin
      if (cbus_clk_en == 1'b1)
        begin
          already_reg <= already;
        end
      end
    end
  end
end
  
```



- Gate level simulation
  - ↳  $t_{on}, t_{off}$ , means noise
  - ↳ very slow
  - ↳ to catch X propagation
- Implementation
  - ↳ Mapping
  - ↳ Placement
  - ↳ Routing

### Synthesis Flow (ASIC)

→ Registered o/p  $\rightarrow$  fixed delay at o/p, w/o charge

→ DFT  $\rightarrow$  scan flop  $\rightarrow$  (to check gates after fab)

$\rightarrow$  stuck at false model  $\rightarrow$

Scan chain to check flip-flop chain

ATPG  $\rightarrow$  Tool used to generate test logic



Pattern at i/p  
 $\rightarrow 00110$

16-10-2024



needed if-else

```

if (cadelr)
  if (burst)
    ptr = ptr2
  else
    ptr = ptr1
else
  ptr = ptr1
  
```

addr == 8FF



## Clock tree synthesis

→ to make clock available to all FF at same time.

Y, X, H, tree

GDSII for fabrication of ASIC

- less switching power.
- more leakage power
- no change in area considerable

## Technology library

→ Standard cell library → gates, mux, latches

### Primitive

- flop, mux, latches

### Macro

- RAM, FIFO, DSP

### IP core

- USB, HDMI, ethernet, PCIE

## Event based simulation

## Cycle based simulation

on change of event.

→ to detect glitches



→ time consuming

## Event scheduling in simulation

active → non-blocking queue → Prioritized Queue

## Time scale

time scale instions

<time-unit> / <time-precision>

## Modeling concepts

Behavioral modeling → equation, functionality.

Dataflow modeling → gates, operators

Structural modeling → predefined block

## ASIC synthesis

Memory Replacement →  $\text{reg } [n:0] \text{ mem}[0:x];$  // fake ram



→ Replace with real memory provided by vendor

→ timing, routing

→ single port, dual port

(multiported)

→ B-RAM hardware available in FPGAs

Block D-RAM gen by code.  
during.

Doc. management.

Organizing:-

Version control.

## Design Unit

{ } - replication operator

== - to compare x, z also

parameter, local param C cannot be over written from top level model)

J K → T toggle

|      | clk | $Q_0$ | $Q_1$ | $Q_2$ | $Q_3$ |
|------|-----|-------|-------|-------|-------|
| init |     | 0     | 0     | 0     | 0     |
| *    | 1   | 0     | 0     | 0     |       |
| *    | 0   | 1     | 0     | 0     |       |
| *    | 0   | 0     | 1     | 0     |       |
| *    | 0   | 0     | 1     | 1     |       |
| *    | 0   | 1     | 0     | 1     |       |
| *    | 0   | 0     | 0     | 0     |       |
| *    | 0   | 0     | 0     | 0     |       |
| *    | 1   | 0     | 0     | 0     |       |

|   | $Q_1$ | $Q_3$ | clr |   |
|---|-------|-------|-----|---|
| 0 | 0     | 0     | :   | : |
| 0 | 0     | 1     | :   | : |
| 1 | 0     | 0     | :   | : |
| 1 | 0     | 1     | :   | : |
| 0 | 1     | 0     | :   | : |
| 0 | 1     | 1     | :   | : |
| 1 | 1     | 0     | :   | : |
| 1 | 1     | 1     | :   | : |

|     | $Q_3$ | $Q_2$ | $Q_1$ | $Q_0$ |
|-----|-------|-------|-------|-------|
| 0-  | 0     | 0     | 0     | 0     |
| 1-  | 0     | 0     | 0     | 1     |
| 2-  | 0     | 0     | 1     | 0     |
| 3-  | 0     | 0     | 1     | 1     |
| 4-  | 0     | 1     | 0     | 0     |
| 5-  | 0     | 1     | 0     | 1     |
| 6-  | 0     | 1     | 1     | 0     |
| 7-  | 0     | 1     | 1     | 1     |
| 8-  | 1     | 0     | 0     | 0     |
| 9-  | 1     | 0     | 0     | 1     |
| 10- | 1     | 0     | 1     | 0     |



## Ripple counter

- any logic like design
- delay gets propagated
- prone to glitch, uncertainty of o/p, clk

= constrain for generated blocks

## Hazard

### Case 0



for change in c from 0 → 1



| AB | $f = \bar{a} \bar{b}$ |
|----|-----------------------|
| 00 | 1                     |
| 01 | 1                     |
| 10 | 0                     |
| 11 | 0                     |



$x_2, x_3, x_4 = 1$

$x_1 = 0 \rightarrow 1$



## Elk & reset

Trem, Tree  
↳ removal      ↳ recovery  
↓ time reset must be stable  
before elk edge

} reset assertion synchronous/async  
} de-assertion ~~out~~ synchronous.

use of synchronizer for  
reset



17-10-2024

always @ (posedge clk or negedge rst)

begin

if (!rst)

rst\_a <= 2'b00;

else

begin

rst\_a[0] <= 'b1;

rst\_a[1] <= rst\_a[0]

end

end

assign global\_rst and rst\_a;

→ change in gate

→ reset value = 0 of each flop.





### Clk & reset

- Single & multiple clk domain
  - multiple sync require
  - data line level sync not need FIFO (VLSIVERIFY.com)
  - or data not certain at a time
- Clk gating (off/on save power)
- Internal generated clk
  - P2L, bresel
  - (generate clk) (to know when to sample.)
- mixed clk → DDR
  - make sure initially remain IOLE, else disable state

X

avoid multiple contact nodes. (lint thrown warning)



| clk | A | B | C | O/P |
|-----|---|---|---|-----|
| 0   | 0 | 0 | 0 | 0   |
| 1   | 1 | 0 | 0 | 0   |
| 2   | 0 | 1 | 0 | 1   |
| 3   | 0 | 1 | 1 | 1   |
| 4   | 0 | 0 | 1 | 0   |
| 5   | 0 | 0 | 0 | 1   |
| 6   | 1 | 0 | 1 | 1   |
| 7   | 1 | 0 | 0 | 0   |

divide by 3

## Synthesis-Simulation mismatch

translate on, OFF  $\Rightarrow$  directives

if-else needed  $\rightarrow$  avoid redundant logic case  $\leftrightarrow$  if-else  
case  
case (1'b1)

Blocking & non-blocking assignment

Function, automatic  $\rightarrow$  saves memory space for repeated call

Arithmatic operation  $\rightarrow$  variables at the end  $y = x[?c:d] + x[?a:b]$   
 $\rightarrow A+B+C+D \text{ vs } (A+B) + (C+D)$

18-10-2024

## Constant propagation

assign a=1;b=1;

always @\*

begin

op = a+b

end

constant 2  
assigned to op

assign sel=1;

if (sel==0)

x=y

else

x=z;

No more inferred

always @\*

begin  
case (sel)

2'b00 = 0

2'b01 = b

2'b10 = c

2'b11 = d

sel is always 0

endcase

## Register duplication

IOW/RAZ  $\rightarrow$  Return always zero

$\hookrightarrow$  Ignore on right

Op won't drive other logic

$\rightarrow$  replication, optimization setting to auto.

Fan out constraint can be provided.

$\hookrightarrow$  on implementation need not be generated.

## Pipelining

→ done at design level not at synthesis  
architectural.



## MUTEX

1. Read is performed by an entity (A or B) to resource (mutex)
2. If a write is performed by some entity that has reserved the resource 'success' response is given by mutex module in the next cycle after receiving clk.
3. If a write is performed by an entity but the mutex is reserved for other entity the response to write will be fail.
4. Whichever entity performs the last read will have the mutex

Read-A → Mutex reserved for A

Read-B → Mutex reserved for B



## how power design

Activity, Frequency, capacitive load ...

- Dynamic power  $\rightarrow$  increases linearly
- Static power  $\rightarrow$  increases exponentially

} trend in shrinking (new tech library)

## Design technique

clock gating, enable, coding style

## FSM encoding

FSM state  $\rightarrow$  Binary, Gray, One-hot

Static power derivation

$\rightarrow$  sub threshold leakage ↑  
 $\rightarrow$  gate leakage ↑  
 $\rightarrow$  gate induced drain leakage. ↑

Voltage domain.  
Multi Vdd, power up power down sequence

Dynamic power  $\rightarrow$  square of voltage ↑

Multi Vdd, power up power down sequence

level shifter  $\rightarrow$  Low  $\rightarrow$  high (have significant delay)

$\rightarrow$  placed on receiver end.

Static voltage scaling -

Multi level voltage scaling -

Dynamic voltage frequency scaling -

Adaptive voltage frequency scaling -

Multi Vt  $\rightarrow$  Voltage technology.



Power gating - more low power than clk gating.

- power up / down sequence to be followed (to avoid uncertainties)
- isolation cell. (<sup>op fixed, x must rd propagate</sup> avoid crowbar current)

» Evening session.

- retention registers (preserve data during power down)  
FIFO, memory  $\Rightarrow$  not possible

Follow power gating sequence



Cooling guidelines

- module name & file name as same.
- inputs in separate line.
- dub global inputs // as comment.
- naming variables as required
- don't change name in protocol
- add functionality to refine
- use `define for fixed value
- use name for FSM
- All caps variable name local name,
- -clk at end, -state, -state-next, -state-r [not -reg]<sup>\*</sup>
- -a for async inputs (reseta)
- -n for active low

- eg. `clk_en, _sel (select line).`
- : always\_Proc (name available in VCS)
- `_reg_Proc (for sequential reg)`
- `_PULL, _READY, _HOLD, _STALL`
- `_lh (one-hot)`
- assign with multiple condition, can be written in multiple line.

## STA



↳ rare  
 - latched, combi  
 - low latency

19-10-2024



### Pass transistors



Master latch, Slave latch  $\Rightarrow$  D flip flop using pass transistors



S1A

$t_{\text{setup}}$ ,  $t_{\text{hold}}$ ,  $t_{\text{clock}}$  delay

$t_{\text{setup}}$  violation  $\rightarrow$  reduce logic delay

$t_{\text{hold}}$  violation  $\rightarrow$  increase logic delay

$$\Rightarrow t_{\text{setup}} = 9 \text{ ns} \quad t_{\text{clock}} = 12 \text{ ns} \quad t_{\text{transport}} = 26 \text{ ns} \quad t_{\text{logic}} = 54 \text{ ns}$$

$$\text{max delay} = t_{\text{setup}} + t_{\text{clock}} + t_{\text{transport}} + t_{\text{logic}}$$

$$\text{max-delay} = 100 \text{ ns}$$

$$f = \frac{1}{100} \times 10^9$$

$$= 10 \text{ MHz}$$

$$\Rightarrow t_{\text{setup}} = 5 \text{ ns} \quad t_{\text{clock}} = 10 \text{ ns} \quad t_{\text{transport}} = 10 \text{ ns} \quad \begin{matrix} t_{\text{logic}} = 13 & 0-1 \\ t_{\text{logic}} = 23 & 1-0 \end{matrix}$$

$$\text{max T} = 25 + 25$$

$$= 50 \text{ ns} \Rightarrow f = 20 \text{ MHz}$$

$$\Rightarrow t_{\text{setup}} = 10 \text{ ns} \quad t_{\text{transport}} = 0 \quad t_{\text{clock}_1} = 8 \text{ ns} \quad t_{\text{clock}_2} = 10 \quad t_{\text{clock}_3} = 12 \text{ ns}$$

$$t_{\text{logic}} (\text{CNAND gate}) 95 \text{ ns} - 0 \rightarrow 1$$

$$t_{\text{logic}} (\text{CNAND gate}) 18 \text{ ns} - 1 \rightarrow 0$$





$$3 + 2 + 4 + 4 + \sqrt{ } \Rightarrow 13 \text{ ns}$$

$\Rightarrow$  STA broken after a flop  $\rightarrow$  next delay calculated after

combo  $\rightarrow 9 \text{ ns}$   
 $\rightarrow 15 \text{ ns}$

Reg - Reg  $\rightarrow 10 \text{ ns}$

12 ns [t<sub>c2q</sub>  $\rightarrow$  reset t<sub>setup</sub>]  
 decidiing





$$D + \bar{D} = 1$$

$$D + 1 = 1$$

$$\text{Combo} \Rightarrow 3 + 4 + 3 + 6 \Rightarrow 16 \text{ ns}$$

$$\text{Reg - Reg}_{\text{FF}_1 - \text{FF}_2} \Rightarrow 4 + 4 + 2 \Rightarrow 10 \text{ ns}$$

$$\text{FF}_1 - \text{FF}_2 \Rightarrow 6 + 4 + 6 + 9 \Rightarrow 25 \text{ ns}$$

$$\text{FF}_1 - \text{FF}_2 = 4 + 3 + 6 + 2 \Rightarrow 15 \text{ ns}$$

Positive skew  $\checkmark$  Negative skew  $\times$   
 positive skew adjustment, propagate to next flop.



21-10-2024

Decade counter

$$t_{su} = 2 \text{ ns} \quad t_{cq} = 3 \text{ ns}$$

$$\text{gate delay} = 5 \text{ ns}$$

$$\text{transport delay} = 1 \text{ ns}$$



$$3+ \\ = 1 + 5 + 1 + 5 + 1 + 2$$

$$= 18 \text{ ns} + 7$$

$$= 25 \text{ ns}$$



22-10-2024



Implement without FSM



- identifying  $0 \rightarrow 1$  or  $1 \rightarrow 0$  transition  $\rightarrow$  excor to detect  
 $\rightarrow$  inequality detector

- a flop to get o/p



Case =

- 1) Full case
  - 2) Parallel case
- } burden enginew.  
     $b_{zz}, b_{zz1} \Rightarrow$  complicated logic

False path  $\rightarrow$

- $\rightarrow$  identification
- $\rightarrow$  L elimination

23-10-2024



$512 \times 32 \Rightarrow$  total  
reach can happen  
only when entries are  $\geq 16$

SISO  $\rightarrow 16$



$(wr\_ptr == rd\_ptr) \rightarrow$  FIFO empty



full & empty programmable

- wr\_ptr

- diff\_ptr → give no of entries present at FIFO  
at given time

- rd\_ptr

$$\text{diff\_ptr} \geq 16$$

$$'b\ 10000 \Rightarrow 16$$

$$\underbrace{\dots}_{\text{OR}} 10000 \Rightarrow \geq 16$$

### Multi-cycle path

↳ different  $t_{\text{combo}}$  / delay

$t_{\text{combo}}$  10ns, 8ns

2-clk to complete some

→ use of enable

→ use of clock gating.

### Constraints

Timing constraints

create\_clock

[source\_object]

[period period\_value]

[-waveform edge-list]

[-edge <sup>name</sup> clock\_name]

create-clock-period 5 [get-port clk1]



create-clock-period 5 - waveform {1,2} [get-port clk2]

posedge



→ latency, uncertainty, transition time



1. Source
2. Network



set-clock-latency [-rise] [-fall] [-source] [-early] [-late] [-clock clock-list]

set-clock-uncertainty 1 [get-clocker clk]

set-clock-uncertainty 2 [from set-clock

set-clock-transition transition-time [-rise] [-fall] [-min] [-max]

create-clock-period 4 <sup>name</sup> clk1 - waveform {0,1,2}

create-clock-period 4 clk2 - waveform {1,3}

create-clock-period 6 clk3 - waveform {2,3}



Synchronous clk

→ share a common source

→ have a fixed phase relationship

set\_clock\_gating\_check [-setup margin] [-hold margin] [-trise] [-fall]  
 [-high] [-low] [-object list]

create\_generated\_clock -

|             |             |
|-------------|-------------|
| $\div by 2$ | $\div by 3$ |
| 00          | 00          |
| 01          | 01          |
| 10          | 10          |
| 00          | 00          |
| 01          | 01          |
| 10          |             |



get\_node, get\_net, get\_cell, get\_nine

set\_false\_path - from register1 -to register2

set\_multicyclepath - setup 2 - from dff1 to dff2 -

24-10-2024

Vivado  $\rightarrow$  Project settings.

Synthesis  $\rightarrow$  settings

- gated-clock-conversion

Implementation  $\rightarrow$  settings.

False path types

1. different clk
2. architecture
3. ifp based static

$\rightarrow$  through  $\rightarrow$  can be used to bypass checked but include other.

set-false-path -to [get\_clocks clk]

-rise-from  $\Rightarrow 0 \rightarrow 1$

-fall-to  $\Rightarrow 1 \rightarrow 0$

$\rightarrow$  Setup Hold  $\Rightarrow$

-setup[2]  $\rightarrow$  setup<sup>check</sup> cycle

-hold  $\square$   $\rightarrow$  hold = setup -1 (count in reverse from  
setup set)

i/o latency  $\rightarrow$  set\_input\_delay delay\_value -clock clock.ref [-max] [-min] [-clock.fall]  
output [rise] [fall] input port.

Async-clock group

set\_clock\_group -asynchronous -group {clkA, clkA.divby2} -group {clkB,

clkB.div3}

Virtual clock

to check asyn. interrupts,

$\rightarrow$  Set\_input\_delay 1 -clock [get\_clock axi\_clk] [get\_port axi\_\*]  
2 [get\_port axi\_wdata]

$\hookrightarrow$  last will over ride previous

set\_max\_delay

set\_min\_delay.

UPF → Unified power format

→ limitation in Verilog

→ functional modeling of power paths  
isolation  
retention



### assertion & coverage

- supply net - power domain - supply port - power switch -  
level shifter - isolation cell - retention cell



$$5\text{ns} - 2 \Rightarrow 3\text{ns}$$

clock to FlopA  $t_{min}=1\text{ns}$   
 $t_{max}=3\text{ns}$

clock to Buffin  $t_{min}=3\text{ns}$   
 $t_{max}=5\text{ns}$

buff delay  $t_{min}=2\text{ns}$   
 $t_{max}=4\text{ns}$

buff out to flopB =  $t_{min}=1\text{ns}$   
 $= 3\text{ns}$

26-1024 Fish tail → synopsys

- Assertion based ... .pdf
- HLA ... report document