

RTL Design

- Diagrams used
  - Schematics (Gates + FFs) - for small detailed design
  - Simple RTL Block Diagrams (one level) - for basic seq. functions.
  - Hierarchical RTL Block Diagrams (multi-level) - entire system designs.
  - Functional Block Diagram (one level) - top level view of system
- RTL = Register Transfer Level.
- Simple RTL Diagrams - can include gates/FF as needed, but usually no.
  - Consists of 2 parts
    - × Registers - has clk & maybe rst
    - × Comb Logic - no clk, no rst, no feedback
  - Helps planning verilog code in a module.
  - Use arrows
- Hierarchical RTL Block Diagrams - can use gates/FF as needed, but usually no.
  - Has 3 parts
    - × Registers
    - × Comb Logic
    - × Blocks for subsystems
  - Represents your entire sys.
  - Clk & ~~rst~~ are external inputs
  - Need to add lower level diagrams for blocks
- Func. Block Diagram - Top level & omit clk & ~~rst~~
- Diagram Process
  - Identify needed functions
  - Identify I/O
  - Describe seq. of op. (Pseudocode, flowcharts etc.)
  - Identify Data + State to be saved
  - Adapt from similar diagram if possible.

Verilog Review• Key Concepts

- Blocking vs Non-blocking assignment
  - ✗ Blocking - ~~executed at end of block~~ (comb-block) ( $=$ )
  - ✗ Non-blocking - Executed at end of block (for ffs) ( $<=$ )
- Reg vs Wire
  - ✗ Wires - relate to wires, can only be given values from a port or ~~as assign~~ (usages)
  - ✗ Reg - can only be modified in 1 block and 1 procedural block  
reps. storage elements.
- Data Types
  - ✗ Logic: 0, 1, X, Z
    - $\uparrow$   $\uparrow$   
unk. high imp.
  - ✗ Wire: defaults to logic, driven by mult sources,  $X > 0/1 > Z$  (precedence)
  - ✗ Logic (X) - can replace wire reg & prevents multiple drivers
  - ✗ X = can set it mult. drivers, not reset or no store yet, latch metastable, gate level: timing hazards, transition 0@1, minmax prop delay
  - ✗ Z = can set it no driver, explicit assign, gate level: output of tristate buff/open drain buff.

## - Latches (Clocked)

- ✗ Bad Design, has times where vars are used in code  $\rightarrow$  latches onto values.
- ✗ Values should only update by clock or always comb?

## • Sequential Logic (Uses ffs + latches (Avoid latches))

- Don't make comb feedback loops
- Using always-ff@() and always-comb (maybe always-latch)
- Storage needs a reset (check reset val and mod in Verilog)
- Initial values for vars, wires, in always/initial doesn't exist in hardware

## • Test benches

- How to decide what to test
  - ✗ Study Design Specification (ifos, features, func., timing req.)
  - tb should test all reqs.
  - ✗ Study Code + State Diagram (branch cond., ifo, regg, etc.)
  - tb should take every branch and toggle a wide range of reg vals, including edge cases.
  - ✗ Cover impl. regg., unknown cases etc.
  - ✗ Try to break it even over simple things like a hacker.
  - ✗ Manage timing for testing & use coverage reports.

Verilog Review contd• Test Benches contd

- We can draw seq. diagrams to see if we're testing right.

ex. Tb stim

DUT checker



## • Unpacked Arrays vs Packed

- Unpacked = elems are not next to each other
- Packed = elems are adj. in mem.

## • Coding Test Benches

- Skeleton Code

```

`timescale 1ns / 10ps
Module <name> ();
  <Parameters>
  <Var Declarations>
  
```

Usually tb (testbench)  
Decl w/ local param  
cared for params  
and fast  
like CLK\_Period

tb-  
vars

Creates  
tb-CLK →  
signals

Funs

to help →

test →

init test →

module →

give all →

tb vars →

avalue →

the →

tests →

can use →

\$stop →

but usually →

for.

err, and

if straight is for success

```

  <Clock Generation>
  always begin
    tb-CLK = 1'b0;
    #(CLK_Period/2.0);
    tb-CLK = 1'b1;
  end #(CLK_Period/2.0);
  
```

<Tasks>

<Module> #(Parameters) DUT (Signals);

initial begin

<Variable Init>

<Tests>

\$finish

endmodule

• May have waveform output (Used for some simulators)  
initial begin  
 \$dumpfile("waveform.vcd");  
 \$dumpvars;  
end

State Machines• Use Moore Machines  $\leftarrow$ 

- Mealy produce long delays + can cause comb. feedback loops.
- Moore is easier to test + debug
- Mealy also can prop. glitches from  $i \rightarrow 0$ , slows max clock speed,

• Can use typed enum logic [ $N:0$ ] {...} State to make it easier. Use these blocks:

- Two comb blocks + always  $ff$  is common.
- Cases can be useful.
- default state prevents latches.

Controllers

## • Usually use FSMs, Syncs, Clock Dividers, Timers, etc.

## • Syncs. Prevent metastable propagation by send values on the clk.

• Clock Dividers ~ slows down operations for processing
 

- Just count and toggle output for a divider (can just pulse for edge detect)
- All blocks should use SysClk, and use clkdiv as just an input.

## • Tips

- Pulse for timers + clkdiv to prev edge detect needs.
- Chain clkdivs for more + better division

SOC Design

- Concept: SOC - System on Chip



- Minimal Architecture:



- CPU - all socs have a processor.
  - ✗ All I/O to CPU goes through bus except clk, reset, + interrupts
  - ✗ CPU manages mem + I/O through bus
  - ✗ IRL might have multi-clks so they stay fast everywhere.
- SOC Bus - basic is just a bunch of wires for signals + data
  - ✗ Sometimes includes address decoding logic + multiplexing logic to avoid tri-state wires
- Bus Manager (sometimes master) - module used/built into components to
  - ✗ The simplest Soc only have one manager
- Bus Subordinate (sometimes slave) - module used/built into comps. to respond
  - ✗ Most modules fall here. In a Soc. to commands from the Bus.
- Your Module - will usually at least need sub-interface.
  - ✗ Can have manager too.
- Memory controller - used to connect bus to RAM
- I/O controller - used to connect I/O to bus. (ex. GPIO, SPI, I<sub>2</sub>C, etc.)
- SOC Bus - primary source of communication on a Soc
  - Many standards, we cover AHB-lite + APB.
    - ✗ AHB-lite (Advanced High performance Bus) - simplest high speed bus Standards for ARM processors.
    - ✗ APB (Advanced Peripheral Bus) - even more simple ARM Standard, but can only be used for low speed I/O.
      - ✗ Uses M/S interfacing, address space, mem. map, multicycle vs pipeline bus.

SOC Design contd

## • M/S interfaces

- Major Signals on SOC bus - Clk, address, wdata, rdata, wenable, ready, wait.
- Write Trans.
  - x M writes address, we, and wdata
  - x S recognizes address + accepts data / assert + wait till ready.
- Read Trans.
  - x M writes address, r.e.
  - x S recognizes address + writes data / assert wait till ready.

- SOC uses address to direct data - often part of it is the dest. and the other is loc in dest. (sometimes T and R respectively)

## • Multi-Cycle SOC bus vs Pipelined SOC bus

- Sample  
Slow to APB  
Performance Similar to ATB/ATB+  
Similar to ATB/ATB+
- Multicycle SOC bus - only one trans. can be in process at a time
    - x Takes 2t clk cycles, one for master + one for slave.
    - More if slave sets wait high.
    - x Depending on bus edges seas (Tord) or level seas (Rish or Low) will vary.
  - Pipelined SOC bus - supports multiple types of transactions.
    - x Has two phases like multicycle - address phase + data phase
    - x Data phase overlaps w/ next address phase
    - x Wait extended current phase (data tadd) until set low.

## • Example of bus usage (AMBA 2.0) (Advanced Microcontroller Bus Architecture)



## • APB

## - Signals (APB bridge)

- x PCLK - from clk source - is clk
- x PRESETn - reset - from sys. bus equivalent
- x PAADDR - the address for the transaction - from APB bridge (up to 32b)
- x PSELx - Select signal - generated by APB bridge (1 for each "S")
- x PENABLE - indicates second half of APB transfer - from APB bridge.
- x PWRITE - high for a write, low for a read - from APB bridge
- x PWDATA - write data - from APB bridge
- x PREADY - wait signal when low - from sub interface.
- x PRDATA - read data - from sub interface (up to 32b)
- x PSLVERR - transfer failure - from sub interface (not req., tie low if unused)



SOC Design Contd

## • AHB-lite

## - Signals

- × HCLK - from clk source - is clk
- × HRESETn - reset - from reset controller
- × HADDR = address for the transaction - from master (32b) (modifiable)
- × HBURST - burst type to indicate it's part of a burst - from master (3b) (least bte)
- × HSIZE - size of the transfer - from master (3b) (modifiable)
  - usually byte, halfword, + word.
- × HTRANS - transaction type (IDLE, Busy, NONSEQ, SEQ) - from master (2b)
- × HWDATA - write data - from master (32b) (modifiable)
- × HWRITE - high for write, low for read (stays the same for entire burst) - from master
- × HRDATA - read data - from 'S' (32b)
- × HREADYOUT - ↑ trans ready, ↓ hostall - from 'S'
- × HRESP - ↑ for trans cr. - from 'S'
- × HSELx = 'S' select sig (one for each 'S') from decoder
- × HPROT - indicates protection levels
  - from 'm'
  - (2b)

## - Model:



For multiple 'm's use  
a 'multi-layer interconnect'  
that implements left but more  
complicated for more devices

- Ex: Similar to APB but res between addy map + res and hrdtata out. also add multiplex w/ addy map (select) + ress.

ASIC Implementation Options

- Options: Standard Cell ASIC, Full Custom ASIC, Gate Array, Field Programmable Gate Array (FPGA), Microcontroller.
- Standard Cell ASIC
  - Library of pre-designed logic "cells"
  - Place+route software positions + connects cells
  - Pros: Cheap in T quantities, design is easier than Full Custom, Easier to reuse IP vs Full Custom
  - Cons: Expensive in T quantities, expensive to change chips, very inefficient for some functions.
- Full Custom ASIC
  - Draw transistor schematics, layout by hand.
  - Pros: Cheap in **HUGE** quantities, has best area/speed optimization
  - Cons: T time + T cost to design, harder to reuse for other tech.
- Mixed Custom + Standard Cell
  - Use Custom for analog blocks (one option) + blocks where Speed / size is critical.
  - Options common for SoCs.
- Gate Array - not used that much
  - Transistors/Logic gates on silicon
  - We design interconnect by adding metal
  - Compromise between ASIC + FPGA
- Reconfigurable Devices (FPGA / Complex Programmable Logic Device (CPLD))
  - Layout of logic + wires is fixed
  - Synthesis + mapping software programs how wires are connected + logic
  - Pros: Very low time to market, quick + easy to prototype / change
  - Cons: Slower + larger than ASIC, expensive in large quant.
- Microcontrollers
  - Used mostly by software for small complexity tasks
  - Used for embedded systems w/ arch. based on SoC.
  - Pros: More flexible, easier + quicker than FPGA, quick + easier to prototype / change, smaller than FPGA
  - Cons: Slower than everything, doesn't exploit parallelism + has fetch + decode. Some glue logic + interface hardware needed.
- Design Time
  - ↑
  - Full Custom
  - Mixed
  - Std Cell
  - FPGA
  - Micro
  - Optimize ability for area
  - T area for T quant.
  - Cheap for T quant
  - Exp. for T quant

Timing Analysis and Critical Paths

## • Important terms

- Timing / delay path - logic propagation path from i/o of comb block
- Critical Path - timing path w/ longest/shortest delay
- Static timing analysis - analyze delay by tracing timing paths
- Dynamic timing analysis - analyze delay by simulation + test vectors

## • We need analysis to decide clock speed &amp; timing constraints

- Slowest path constrained by clk speed, t<sub>pd</sub>, t<sub>setup</sub> & time before clk data valid
- Fastest path constrained by clk skew, t<sub>pd</sub>, t<sub>hold</sub> & time after clk data valid

$\Delta t_{pd}$  is prop. delay  $\propto \frac{1}{\text{dist in clk ducts}}$  len path

- FF out can be used for i and FF in can be used for o.
- Worry abt t<sub>hold</sub> when comb delay + t<sub>pd</sub> < t<sub>hold</sub> + clk skew

## • Timing reports can be generated, but not fixed till placement.

- Includes start, end, & crit path delay

- Try to analyze rtl and verify w/ report