

# Advanced eXtensible Interface(AXI)



In any chip we have some modules like this. Having modules is not sufficient; they must be able to communicate. To do this, we should connect a **bus**, which transfers data from one point to another on chip.

Advantages of connecting buses

- 1)makes standard(organized) communication so that communicate easily
- 2)makes easy to debug
- 3)makes easy to reuse in other designs by creating IP

There are many SOC buses like Core Connect,Wish Bone,AXI..

In these AXI is extensively used which is designed by ARM.Vivado design flow is based on AXI interfaces.

AXI is a bus or this is a kind of protocall or way of signalling through which modules can communicate with others

# Basic Components in AXI interfaces

- 1)AXI Master
- 2)AXI Slave
- 3)AXI Interconnection

**Transaction:** Transferring data from one point on chip to another.



**AXI Master:** Master triggers to read/write transactions

**AXI Slave:** Slave responds to masters command

# The 5 Channels of AXI Interface



Read Address Channel, master send read address to slave

Read Response Channel, slave send to master indicate that read operation has done

## Connecting Masters & Slaves



## AXI Interconnect



Communication between multiple masters and slaves cannot be done with direct connection.

Interconnect is slave to master and master to slave.

# AXI Interconnect Address Decoding



AXI Interconnect has decoding table which routes to specific slave based on address range

## Advantages of AXI Interconnection

1) Multiple masters and slaves communicate

2) Width Conversions

master send 64 bits but if slave capable of receiving only 32 bits then interconnect send two times to slave by sending half of 64 bits each time.

3) Clock domain transformation

master works at 500MHz but if slave works at 250MHz then interconnect send data by frequency division.

4) Interconnect has registers which improve performance



# Hierarchical AXI Interconnects



If we have less no. of ports to interconnect than no.of masters or slaves then another interconnect helps to communicate all masters and slaves with others.

This makes architecture to easily expandable.

## Some important rule for AXI Interconnection

- Meet required alignment in Address



4K:20'h00000 - 20'h00FFFF

64K:20'h01000 - **WRONG**

64K:20'h10000 - 20'h1FFFF

Address Range : 4K  
Address Offset : 0x40000000  
Address : 0x40000000 - 0x40000FFF

Address Range : 2G  
Address Offset : 0x40001000  
Address : 0x40001000 - .... (**Wrong!**)

Correct : Offset should be in 2G Boundary ! e.g. 0xC0000000



**Memory Mapped** : Read/write transactions can perform based on destination address

**Streaming:** Data flow is only in one direction and there is no requirement of address



## Naming Style of AXI Ports

- Memory Mapped



- Stream

- Signal names:

- e.g. S\_AXI\_awid, M\_AXI\_arready ....





## Full AXI vs AXI-Lite



For one call of transaction of AXI-Lite **only one chunk of memory** has transferred. This chunk of data can be more than one bit like 16-bit also.

For one call of transaction of Full AXI **multiple chunk of memory** can transfer.

# ZYNQ Architecture

**1951** UNIVAC I : 0.015 operations per 1 watt-second

*Half a century later!*

**2012** ST P2012 : 40 billion operations per 1 watt-second



**Solution : Specialized functional units (Accelerators)**

**What really Accelerators does???**



Traditional CPU

# Modern CPU



Accelerators typically have their own memory space. How do they interact with the host system? How does the CPU cache support multiple accelerators? Then how to access them?

# FPGA vs CPU

## FPGA

- 1)FPGA is extensively used for processing large size of data and return it like FFT .
- 2)High I/O applications so that interfaces many peripherals.
- 3)Massively parallel algorithms, performing a task through performing its subtasks at a time to complete the task.

## CPU

FPGAs are not capable efficiently to handle many **flow control operations** like if-else conditions,for loops. For image compression techniques like DCT FPGA is not flexible.In such cases CPU performs very well.



# MicroBlaze

Xilinx has designed a microprocessor core MicroBlaze which has CPU on FPGA.



But this have many disadvantages,

- 1)It occupies more area on FPGA
- 2)The CPU speed is not at all good as an individual CPU



## Viertex-5 FXT FPGA

This FPGA has CPU core IP block on FPGA this designed by ARM and deployed in ZYNQ architecture



# Xilinx ZYNQ Architecture



This block represents through which PS control modules in PL are MGP0, MGP1, ACP, and ACP. Note that the port is of size 64-bits. In this part, the AXI Master connects to the Inter Connect, which then connects to the PS. An Interconnect to communicate between these modules and PL masters and slaves in PL.

- 1) Programmable Logic(PL): DSP SHCS, LUTS, 6LBS, Clock Managers, ACP, Transvers, ADC or DAC this part resembles FPGA.
- 2) Programmable System(PS): Two ARM CPU IPs, Caches, Interconnect, DRAM, AXI DMA controller, Peripherals, OCM.



Figure 3-2: APU System View Diagram

Table 4-1: System-Level Address Map

| Address Range                         | CPUs and ACP | AXI_HP | Other Bus Masters <sup>(1)</sup> | Notes                                                   |
|---------------------------------------|--------------|--------|----------------------------------|---------------------------------------------------------|
| 0000_0000 to 0003_FFFF <sup>(2)</sup> | OCM          | OCM    | OCM                              | Address not filtered by SCU and OCM is mapped low       |
|                                       | DDR          | OCM    | OCM                              | Address filtered by SCU and OCM is mapped low           |
|                                       | DDR          |        |                                  | Address filtered by SCU and OCM is not mapped low       |
|                                       |              |        |                                  | Address not filtered by SCU and OCM is not mapped low   |
| 0004_0000 to 0007_FFFF                | DDR          |        |                                  | Address filtered by SCU                                 |
|                                       |              |        |                                  | Address not filtered by SCU                             |
| 0008_0000 to 000F_FFFF                | DDR          | DDR    | DDR                              | Address filtered by SCU                                 |
|                                       |              | DDR    | DDR                              | Address not filtered by SCU <sup>(3)</sup>              |
| 0010_0000 to 3FFF_FFFF                | DDR          | DDR    | DDR                              | Accessible to all interconnect masters                  |
| 4000_0000 to 7FFF_FFFF                | PL           |        | PL                               | General Purpose Port #0 to the PL,<br>M_AXI_GP0         |
| 8000_0000 to BFFF_FFFF                | PL           |        | PL                               | General Purpose Port #1 to the PL,<br>M_AXI_GP1         |
| E000_0000 to E02F_FFFF                | IOP          |        | IOP                              | I/O Peripheral registers, see <a href="#">Table 4-6</a> |
| E100_0000 to E5FF_FFFF                | SMC          |        | SMC                              | SMC Memories, see <a href="#">Table 4-5</a>             |
| F800_0000 to F800_0BFF                | SLCR         |        | SLCR                             | SLCR registers, see <a href="#">Table 4-3</a>           |
| F800_1000 to F880_FFFF                | PS           |        | PS                               | PS System registers, see <a href="#">Table 4-7</a>      |
| F890_0000 to F8F0_2FFF                | CPU          |        |                                  | CPU Private registers, see <a href="#">Table 4-4</a>    |
| FC00_0000 to FDFF_FFFF <sup>(4)</sup> | Quad-SPI     |        | Quad-SPI                         | Quad-SPI linear address for linear mode                 |
| FFF0_0000 to FFFF_FFFF <sup>(2)</sup> | OCM          | OCM    | OCM                              | OCM is mapped high                                      |
|                                       |              |        |                                  | OCM is not mapped high                                  |

Address range is reserved for each module.

On the ZYNQ boards we can run set of **Operating Systems(OS)**

### 1)Bare Metal

This is used for running a specific application without any concurrency between several applications.This makes not to consume power for other processing tasks.This is the simplest option and its efficient.

### 2)Linux

On the ARM CPU we can run Linux.By this system can gain lot of possiblities which are very difficult to obtain through bare metal.

Petalinux tools unables to develop xilinx embedded systems.