

# DESD THEORY

## 1 - FPGA INTRODUCTION

FPGA are made of:

- CLB (CONFIGURABLE LOGIC BLOCKS)
- I/O BLOCKS
- Programmable interconnections



All set with CAD.  
configurations can change!

BEPROGRAMMABILITY

GTS: Global  
3-state  
IF Head: output are  
IO to IO interface  
global self or same

INTERCONNECTIONS:

PROGRAMMABLE INTERCONNECT POINTS (PIP):  
• break point  
• cross point  
• compound cross-point

THE MORE THE SIGNAL PATH  
SWITCHES THE HIGHER THE  
LOAD SEEN

thanks to LOOK-UP TABLES (LUT)

ADDRESSES THE TABLE WITH INPUTS  
OF THE FUNCTION, FOR WHICH CORRESPONDS  
A STORED VALUE.  
(IT'S A SORT OF MEMORY)

I/O BLOCKS: programmable input or  
output with

- \* TRI-STATE BUFFERS for bidirectional control
- \* FF/latches
- \* DELAYED OUTPUT CLOCKS
- \* PULL-UP/DOWN RESISTORS
- \* PROGRAMMABLE VOLTAGE AND CURRENT LEVELS
- \* SET BY CONFIGURATION BITS

to transfer data from outside to  
inside:

- buffers to drive dedicated I/O lines
- also delay to I/O
- CLOCK CAPABLE INPUT BUFFER

CLB:

Made of

- LUTs
- DISTRIBUTED MEMORY
- SHIFT REGISTERS
- WIDE MULTIPLEXERS

Each CLB connected to a switch matrix for access to general routing matrix. (16 columns)

• A CLB is made of a PAIR OF SLICES

- SUCEL logic slices
- SUCEH can use LUTs as 64-bit PATH

or 32-bit shift registers.

every LUT can use a FF to register the input  
SLICES contains also MULTIPLEXERS and ARITHMETIC CARRY  
SLICES not connected between each other.

in SLICE:

- DISTRIBUTED RAM
- SHIFTING DATA WITH 32-bit registers

can be used to receive  
serial decays.

with SUCEL or SUCEH Port Interfaces

1 port: SYNC WRITE (durable queue)  
1:3 port: ASYNC READ

CARRY LOOKAHEAD calculate in advance the carry

THE HIGHER THE INPUT BITS OF ADDITION, THE HIGHER THE CRITICAL DECAY PATH FOR CARRY

SOLUTION: COMPUTE IN PARALLEL ALL POSSIBLE CARRY PATH AND THEN SELECT THE RIGHT ONE



$$\begin{aligned} C_{\text{OUT}} &= AB + AC_{\text{IN}} + BC_{\text{IN}} \\ &= AB + (A + B)C_{\text{IN}} \\ &= G + C_{\text{IN}} \end{aligned}$$

generate propagate

| A | B | $C_i$ | S | $C_o$ | Carry status |
|---|---|-------|---|-------|--------------|
| 0 | 0 | 0     | 0 | 0     | delete       |
| 0 | 0 | 1     | 1 | 0     | delete       |
| 0 | 1 | 0     | 1 | 0     | propagate    |
| 1 | 0 | 0     | 1 | 0     | propagate    |
| 1 | 0 | 1     | 0 | 1     | propagate    |
| 1 | 1 | 0     | 0 | 1     | propagate    |
| 1 | 1 | 1     | 1 | 1     | generate     |

$$\begin{aligned} C_{o,1} &= P_0 C_{\text{IN}} + G_0 \\ C_{o,2} &= P_1 P_0 C_{\text{IN}} + P_1 G_0 + G_1 \\ C_{o,3} &= P_2 P_1 P_0 C_{\text{IN}} + P_2 P_1 G_0 + P_2 G_1 + G_2 \\ C_{o,4} &= P_3 P_2 P_1 P_0 C_{\text{IN}} + P_3 P_2 P_1 G_0 + P_3 P_2 G_1 + P_3 G_2 + G_3 \end{aligned}$$

can be referred to first inputs

SYSTEM ON CHIP (SoC)

PROCESSOR, SYSTEM  
+  
PROGRAMMABLE LOGIC



## DSP: DIGITAL SIGNAL PROCESSING

can implement custom parallel algorithms  
up to  $25 \times 18$  CPLZ, 48 bit-accumulator,  
power saving pre-adder, arithmetic unit,  
logical unit, pattern detector



## FPGA vs ASIC (Application Specific IC)

FPGA are re-programmable (also while running)  
ASIC are permanent, cannot be changed

- ASICs suitable for high volume production
- FPGA for prototyping and validating design (also of ASIC designs)

- FPGA limited in frequency compared to ASIC (due to routing and blocks)

ASIC with same FPGA are much faster since they are optimized for specific function.

- DESIGN OPTIMIZATION AFFECTS CODE CAN SPEED-UP A LITTLE BUT the FPGA

- ASICs are more power efficient

- ANALOG DESIGN ARE NOT POSSIBLE WITH FPGA (but today they can incorporate some analog stages like ADC and PLLs, however not so much flexible to create complex circuits like a RF transceiver)

↳ ASICs can have complete analog design (or mixed)

- FIELD PROGRAMMABLE ANALOG ARRAY (FPPA) failed due to too narrow band units due to switch matrix architecture.

USED FOR PROTOTYPING  
NO NRE, FREE SOFTWARE  
ONLY IC COST OF BOARD



TOTAL COST

ASIC

FPGA

ASIC costs start higher even for low volume due to very high NRE costs, but slope is flatter

ALMOST ZERO NRE COST FOR FPGA, COST IS FOR ACTUAL IC

Crossover volume where ASICs start becoming cheaper than FPGAs for volume production

HIGH NON RECURRING COST (NRE)

(IC MASK IN FOUNDRY)  
DIFFICULT FOR PROTOTYPING  
(COSTLY IN SMALL QUANTITIES)



ASIC cost components:

- ASIC EDA tools and training
- Cost of designing
- Mask Cost
- Cost of simulating
- ASIC Masks Cost
- Wafer Cost
- Wafer Processing
- Die Fabrication
- Yield & Manufacturing Loss
- Packaging

BOTH DESIGNED with VHDL  
ASIC REQUIRE MUCH MORE ATTENTION IN IMPLEMENTATION,  
which is done almost automatically in FPGA CAD



FPGA vs ASIC

|                   | FPGA | ASIC |
|-------------------|------|------|
| NRE               | ✓    | ✓    |
| Performance       | ✓    | ✓    |
| Time to market    | ✓    | ✓    |
| Design Flow       | ✓    | ✓    |
| Cost per Unit     | ✗    | ✓    |
| Barrier to Entry  | ✓    | ✓    |
| Energy Efficiency | ✓    | ✓    |
| Analog Blocks     | ✓    | ✓    |

© NUMATO LABS

## DESIGNING in FPGA : exploit PARALLELISM!

### Temporal Computation

The traditional paradigm  
Typical of Programmers  
Things done over time steps

```

A = input("A= ?");
B = input("B= ?");
C = input("B multiplier ?");
X = A + B * C;
Y = A - B * C;

```

Which do you think is easier to make sense of?

### Spatial Computation

Suited to hardware  
Possibly more intuitive?  
Things related in a space



### Software/Program

Processor running a program written using a pre-defined fixed set of instructions



### Temporal Computing

### Hardware

Defined by fixed functionality and connectivity of hardware elements



### Spatial Computing

### TEMPORAL PARALLELISM :

require PARTITIONING of a PROCESSING TASK

→ PIPELINE STRUCTURES

LATENCY ↑  $\frac{1}{n}$   
THROUGHPUT ↑  $n$

depends on DIVISIBILITY of task to be parallelized

### SPATIAL PARALLELISM :

replicate the processing task to be computed by its own dedicated component  
→ UNITS OF INPUT INFORMATION must be PARTITIONABLE.

depends on NUMBER of INDEPENDENT TASKS



## PROGRAMMABLE LOGIC DEVICE (PLDs)



CPLD

2 ARRAY OF PLD

### Concept of Programmable Macrocell

- FF for sequential logic or bypass for combinational logic
- Feedback of FF present state for FSM implementation



### Sequential PLA = PAL



## 2 - CLOCK AND TIMING

→ THERE ARE SOME NON-IDEALITIES IN REAL CLOCKS

- JITTER** (unavoidable, always present)
- SKew** (due to delays of the lines) → **GOOD CLOCK ARCHITECTURE NEEDED**

→ FPGAs have DEDICATED GLOBAL and REGIONAL I/O and CLOCKING RESOURCES

↳ **CLOCK MANAGEMENT TILES (CMT)** provide:
 

- \* clock tree synthesis
- \* skew
- \* jitter filtering

 ⇒ LOCAL routing not recommended



⇒ MULTIPLE CLOCK REGIONS supplied by CLOCK TREES:



- each CLB is connected to the **GLOBAL Routing Network** through **SWITCH MATRIX**.
- DEDICATED LINES** for clock propagation, to avoid **SKew**.
  - driven by **BUFG (GLOBAL CLOCK BUFFER)** → drive every region through HROW
- each **I/O BANK** contains **CLOCK-CAPABLE INPUT PINS**.
- HORIZONTAL CLOCK BUFFER (BUFH)** allows access to clock lines of a single clock region.
- MULTI-CLOCK REGION BUFFERS (BUFRB)** to span up to 3 vertically adjacent clock regions.
- BUFI0** for **I/O CLOCK TREE**
- BUFB** (REGIONAL CLOCK BUFFER)



### HIGH PERFORMANCE CLOCK ROUTING with

- low jitter
- minimize duty cycle distorted direct path

→ **STILL THERE ARE DELAYS (deterministic or stochastic) ⇒ TIMING CONSTRAINTS**

- $t_{PPD}$ : maximum propagation delay CLK → Q
- $t_{CD}$ : minimum contamination delay CLK → Q
- $t_{SETUP}$ : minimum time D stable before clock edge.
- $t_{HOLD}$ : minimum time D stable after clock edge.



### 3 - I/O RESOURCES

FPGAs offers both:

- o HIGH-PERFORMANCE (HP)  $\Rightarrow$  for HIGH SPEED MEMORIES
- o HIGH-RANGE (HR)  $\Rightarrow$  to support WIDE RANGE of I/O standards (with different voltages)

$\hookrightarrow$  every I/O pin contains INPUT, OUTPUT, 3-STATE DRIVERS



DIFFICULT MANUFACTURING DESIGN  
WORSE SIGNAL INTEGRITY



due to REFLECTIONS and BIASING which require LINE TERMINATION

proper

• PROGRAMMABLE OUTPUT STRENGTH and SLEW-RATE, ON-CHIP TERMINATION with digital-controlled impedance

- internally generated REFERENCE VOLTAGE
- o Configurable SelectIO Pins, both SINGLE-ENDED or DIFFERENTIAL (2 pins)

- NO NEED OF TERMINATION RESISTOR ON-BORD
- o REDUCE BOARD ROUTING DIFFICULTIES
- o IMPROVE SIGNAL INTEGRITY BY ELIMINATING STUB REFLECTION

DCI used because a real resistor can be impossible to place near pins, or create stub effects.

ADAPT TO PROCESS THERM. VARIATIONS

CAN control: - OUTTER Z OF A DRIVER  
- ADD PARALLEL TERMINATION TO MATCH Z OF TRANSMISSION LINE

due to - CROSSTALK BETWEEN CONNECTIONS  
- REFLECTIONS

$$\rho = \frac{Z_{\text{terru}} - Z_0}{Z_{\text{terru}} + Z_0}$$

REFL. COEFF.  
difficult to achieve  
 $Z_{\text{terru}} = Z_0$   
because

HIGH INPUT IMPEDANCE  $\neq$  IMP. OF LINE  
of MOS



Ringing

- > Source impedance too low
- > Load impedance too high
- > Initial step too big (overshoot)



Stepwise

- > Source impedance too high
- > Load impedance too high
- > Initial step too small (doesn't meet VOH)

HOW TO SOLVE:

- > End termination
- > Series termination
- > Both ends termination
- > Any of these approaches can prevent problems with long-line reflections or short-line ringing

Most popular styles for digital logic

$\hookrightarrow$  LIBRARY support total freedom on I/O output/input/bidirectional buffers, both single or differential.



DIFFERENTIAL is better: RETURN CURRENTS BALANCED AND CANCEL EACH OTHER OUT (ideally zero current flowing through the ground connection)

Also reduced effect for external EMI and CROSSTALK and also eliminated ones. (contributions are equally added to inverted and not-inverted signals)

- LESS POWER FOR SAME SNR

$\rightarrow$  CROSS-OVER POINT, SHIFTS IF TWO WIRES HAVE DIFFERENT LENGTH

- BETTER NOISE MARGINS (High or Low state determined by comparing non-inverted and inverted signal)



- There is a **TRADE-OFF** between **POWER/PERFORMANCE** (delay of input buffers). We can choose that **DELAY**.
- We can also choose a **SLEW-RATE** but **HIGHER** means also **MORE REFLECTIONS (and NOISE)**.

**STRENNETTI** ( $i_{in} \text{ mA}$ )

**PULL-UP/DOWN OR KEEPER CIRCUIT**

### COMMUNICATION STANDARDS

- SERIAL-to-PARALLEL CONVERTERS present
- PARALLEL-to-SERIAL " present
- IO FIFOs

} due to **PACKAGE INDUCTANCE**, each package supports a limited number of **SIMULTANEOUS SWITCHING OUTPUTS (SSOs)** (particularly when HIGH-DRIVE outputs ARE USED HIGH)



NOT ALL PINS ARE EQUAL, BECAUSE INDUCTANCE IS NOT A PROPERTY OF ANY INDIVIDUAL WIRES BUT IT'S A PROPERTY OF THE SPACE BETWEEN CONDUCTORS

Mechanism of coupling among pins generating interferences that is called

### SIMULTANEOUS SWITCHING NOISE (SSN)

can be reduced:



#### REDUCING PIN INDUCTANCE

- use better packaging with reduced lead inductance
- increase the number N of power and ground pins
- drive fewer loads or smaller loads

#### DISTRIBUTE UNIFORMLY GND PINS TO REDUCE RETURN CURRENT LOOPS

- there are software to predict SSN and NOISE MARGINS selecting a **VICTIM PIN** (considering all **AGGRESSORS PINS**) (over 8mA)
- PLACE STRONG OUTPUTS AND/OR SSOs SEPARATED FROM SENSITIVE INPUTS/OUTPUTS (HIGH-SPEED)
- ADD VIRTUAL GROUNDS to reduce SSO



Obvious: Faraday's law states that the induced voltage is proportional to the rate change of flux in the loop of the circuit: the greater the area of circuit loop the greater the inductance and the greater the induced voltage.

$$V_{GND} = L_{GND} \cdot \frac{di}{dt}$$

but we measure  $V_F$  between F and GND (since we do not have access to the conductive plane inside the package)

$$|L_{GND}|_{V_F} > |L_{GND}|_{V_D}$$



$|V_F| > |V_D|$  DISTURBANCE (CROSSTALK) ON F IS GREATER.



## 4 - PIPELINE



FPGA is made of a sea of CLBs that can operate at the same time

⇒ PARALLEL PROCESSING

TEMPORAL PARALLELISM

II PARTITIONING OF TASK

PIPELINE

↓ LATENCY OF SINGLE RESULT ↑

but also THROUGHTPUT ↑

SPATIAL PARALLELISM

replicating processing task if original information can be partitioned



→ in FPGAs PIPELINE is achieved by interposing REGISTERS between blocks



→ We call LATENCY the delay associated with the number of clock cycles lost before the first valid output appears.

→ the greater the number of pipeline stages the greater the latency

→ THE LONGEST PATH (in DELAY) BETWEEN REGISTERS sets the f<sub>CLOCK</sub><sup>HARD</sup>

→ the THROUGHPUT is much higher (1 output per clock cycle)

⇒ we have to wait initially then great advantage with throughput.



HOW TO SPEED-UP PERFORMANCE EVEN MORE? HOW TO ELIMINATE THE BOTTLENECK OF THE SLOWEST PATH?

1 PIPELINE INSIDE THE LIMITING LOGICAL BLOCK



- DIFFICULT TO MODIFY AN IP

2 INTERLEAVING: AREA ↑ MAX UPPER LIMIT SET BY FF OF PIPELINE



use the same logical block two times in sync, alternating them with a MUX



INTERLEAVING with N REPLICAS = N-stage PIPELINE

## 5 - BUSES

a **BUS** is a COMMON ELECTRICAL PATHWAY BETWEEN MULTIPLE DEVICES, both:

- INTERNAL: ex. data from/to the AW
- EXTERNAL: ex. for CPU-memory or I/O device connection.

- MUST obey some COMMON RULES that BUS have to follow (in designing)

### BUS PROTOCOL

- there are also MECHANICAL/ELECTRICAL Specs

- devices that can initiate bus transfers are called **MASTERS**. → most connected to a **BUS DRIVER** (digital amplifier)
- devices that are passive and wait for request are called **SLAVES**.

→ most connected to a **BUS RECEIVER**.

#### SYNCHRONOUS BUS:

- has a fixed clock driven by a PREFERENCE OSCILLATOR square-wave like.
- every operation takes an integer number of cycles (BUS CYCLES)

(?) WE COULD WASTE SOME TIME  
3.5 cycles → 6 cycles!

(?) DIFFICULT TO ENHANCE  
ONCE  
BUS CYCLE IS FIXED

(?) IMMUNITY TO CROSSTALK  
happens at switches we can wait to sample while cross-talk noise settle

There are

#### ASYNCHRONOUS BUS: DOES NOT HAVE A CLOCK.

- bus cycles can be of any length and not the same.
- handled by **MIXED TECHNOLOGY**

(?)

NO CLOCK SKW  
LOW POWER  
NO NEED TO RESPECT A CERTAIN LIMIT OF SLOW PATH  
NO NEED OF OPTIMIZATION OF CLOCK (VS TEMP/FABRICATION)  
AUTOMATIC ADAPTIVE SPEED

(?) DIFFICULT TO DESIGN  
HARDWARE AND GLITCHES  
SENSITIVE TO CROSSTALK ALL THE TIME

TO GET MORE DATA OVER THE BUS:

- MORE DATA LINES (difficult to obtain identical electrical characteristics)
- FASTER BUS SPEEDS (limited by skew)

→ DATA TRAVERS THE BUS AT DIFFERENT SPEEDS

HOW HARD IS TO DESIGN A BUS ?

$$\Rightarrow \frac{T_{\text{delay}}}{T_{\text{clock}}} \quad \text{figure of merit of difficulty of bus design}$$

NUMBER OF LINES OF BUS

!!  
BUS WIDTH

⇒ BANDWIDTH → larger buses are more expensive

to INCREASE: - decrease BUS CYCLE TIME  
- increase width



A BUS can be

#### SERIAL:

nowadays it's the preferred choice for high-speed buses  
NO SKW PROBLEMS (line-to-line signal).  
MORE FLEXIBLE APPROACH TO DATA PLATES (allows longer cables)  
REDUCTION OF LINES (SAVE BOARD SPACE)  
FASTER WITH HIGH-CLOCKS  
FULL-DUPLEX (specific wires for TX, RX)

#### PARALLEL:

suffers of CROSS-TALK, SKW, PIN COUNT  
NO DELAY for DECODING  
SAME STRUCTURE BOTH FOR PX AND TX (HALFDUPLEX)  
IMPOSSIBLE TO CREATE IDENTICAL WIRES  
RIGHT ORDER OF PARALLEL OF DATA IS LOST

speeding-up results  
in problem of BUS SKW  
of individual lines travels at slightly different speeds

another solution is to use multiplexed lines for addressing and data transfer

(?) BUS NOT COMPATIBLE  
with PRE-EXISTING  
DEVICES

COMMUNICATION IN  
MULTIPLE STEPS  
IS SLOW



## DIFFERENTIAL SIGNALLING :



The receiver extracts information by detecting the potential difference between the inverted and non-inverted signals. The two voltage signals are "balanced," meaning that they have equal amplitude and opposite polarity relative to a common-mode voltage.

→ same parasitics, opposite current, NO OVERALL PARASITIC EFFECT  
HOPE WAVES (and AREA, POWER CONSUMPTION)

preferred for >MHz communications

ideally zero current flowing towards GND

TOLERANCE to exterior EM Interference

LOWER VOLTAGE OPERATION

HOPE IMMUNITY TO NOISE, BIGGEST STATIC NOISE MARGIN

# BUS TECHNOLOGIES

## USB:



Type A socket  
(From Front)



Type B socket  
(From Front)

| Pin No. | Signal                | Color of the cable |
|---------|-----------------------|--------------------|
| 1       | Power ground          | Black              |
| 2       | Data - White / Yellow |                    |
| 3       | Data + Green / Blue   |                    |
| 4       | Ground Black/Brown    |                    |

USB SOCKETS & PINS

| Version | First available | Released by     | Speed                             | Bits/sec            | Max. distance |
|---------|-----------------|-----------------|-----------------------------------|---------------------|---------------|
| USB 1.1 | 1998            | -               | Low speed (LS)<br>Full speed (FS) | 1.5 Mbps<br>12 Mbps | >9.25 m       |
| USB 2.0 | 2001            | USB 1.1         | High Speed (HS)                   | 480 Mbps            | <4 m          |
| USB 3.0 | 2009            | USB 1.1/2.0     | SuperSpeed (SS)                   | 5 Gbps              | >70 sec       |
| USB 3.1 | 2014            | USB 1.1/2.0/3.0 | SuperSpeed+ (SSP)                 | 10 Gbps             | >35 sec       |

- PC host controller HW and SW
- Robust connectors and cable assemblies
- Peripheral friendly master-slave protocols
- Expandable through multi-port hubs
- Low cost
- Auto-configuration
- Hot plugging
- Outstanding performance
- Powered bus

## I<sup>2</sup>C:

### Synchronous Serial



- <Clock on one wire, and synchronized data on other
- Usually used to connect microcontrollers/processors to peripherals
- SPI (Serial Peripheral Interface) – 1 wire for each data + 1 clock wire
- <=2 (Inter IC) – 2 wires for bidirectional data direction is handled by protocol + 1 clock wire

### I<sup>2</sup>C Bus Characteristics

- Includes electrical and timing specifications, and an addressing mechanism
- Two-wire serial data & control bus implemented with the serial data (SDA) and clock (SCL) lines
- For reliable operation, a third line is required:  
Common ground
- Unique start and stop condition
- Slave selection protocol uses a 7-Bit slave address
- The bus specification allows an extension to 10 bits
- Bi-directional data transfer
- Acknowledgment after each transferred byte
- No fixed length of transfer
- True multi-master capability
- Clock synchronization
- Arbitration procedure
- Transmission speeds up to 100KHz (classic I<sup>2</sup>C)
- Max. line capacitance of 400pF, approximately 4 meters (12 feet)
- Compatible with different IC technologies

## Master:

- Initiates a transfer by generating start and stop conditions
- Generates the clock
- Transmits the slave address
- Determines data transfer direction

## Slave:

- Responds only when addressed
- Timing is controlled by the clock line

## SPI (Serial Peripheral Interface):

- As compared with its counterpart I<sup>2</sup>C, SPI is more suited for data stream applications.

Communication between DSPs,  
ADC

- SPI can also achieve significantly higher data rates than I<sup>2</sup>C.

### Full Duplex capability.

<SPI defines four types of signals

| SPI Signal | Name                |
|------------|---------------------|
| MISO       | Master In Slave Out |
| MOSI       | Master Out Slave In |
| SCLK       | Serial Clock        |
| SS         | Slave Select        |



## AXI:

- read address,
- read data,
- write address,
- write data,
- write response.

### Read transaction

- Read data channel transfers data from the slave to the master.

### Write transaction

- Write data channel transfers data from the master to the slave.
- Write response channel, the slave uses the write response channel to signal the completion of the transfer to the master.



## N:N interconnect.

An N:N interconnect handles the access of a single slave from N masters. This is managed by an arbiter block: when more than one master tries to issue an operation by raising xVALID in the same clock cycle, the interconnect keeps the corresponding xREADY signals low for every master except for one, whose request is passed unmodified to the slave (if the address is correct).

The priority of the masters is usually configurable, ranging from a round-robin arbitration to a fixed-priority one.

**N:M interconnect.** On the other hand, a M:N interconnect connects a single master to M slaves. Each slave has a (usually static) address mapped to it, and the master can access them all directly. When the master performs a read/write operation, the interconnect looks up if the specified address is mapped by one of the slaves: if so, it forwards the operation to the correct slave.



Once 1:N and N:M interconnects have been defined, it is possible to construct an N:M interconnect by combining them.

The arbiter and router blocks can be connected either in full crossbar mode (two masters can access to two different slaves during the same clock cycles), which leads to higher performance but uses more resources, or in shared access mode, where only one device can access the channel during each clock cycle.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

In SAMD mode, the N-to-M use case of the AXI Interconnect core provides for only one transaction at a time. For each connected master, read transaction requests always take priority over writes. The arbiter then selects from among the requesting masters. A write or read data transfer is enabled to the targeted slave device. After the data transfer (including the write response) completes, the next request is arbitrated. SAMD mode minimizes the resources used to implement the module of interconnection.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

Usually, the address channel bandwidth requirements are much lower than the data ones (each read/write address can lead to up to 256 data transfers) so, in general, a SAMD (Shared Access Multiple Data) is preferred.

There also exists a simplified subset of AXI4, called AXI4-Lite, whose key limitations are:

- all transactions are of burst length 1,
- all data accesses use the full width of the data channel,
- all accesses are Non-modifiable, Non-bufferable,
- Exclusive accesses are not supported.

Basically, AXI4-Lite retains the structure of AXI4 but removes most of the auxiliary signals and supports only 1-word transactions.

AXI4-Stream is another subset of AXI4 described in the AMBA specification. Unlike AXI4 and AXI4-Lite, in AXI4-Stream the data flows only in one direction, from the master to the slave, and no address is transmitted.

The signals employed are just:

- ACLK and ARESETn,
- TDATA, TVALID and TREADY for the basic handshake,
- TLAST to mark the end of the stream,
- TSSTR and TKEEP to mark a byte as data, null or positional,
- TID to identify different streams,
- TDEST to provide routing informations,
- TUSER for a user-defined sideband signal.

The handshake mechanism in AXI4-Stream is the same used in AXI4 and AXI4-Lite.