



## **Versal Adaptive SoC Design Guide (UG1273)**

## System Architecture

Series Comparison  
Device Layout  
AI Engine  
Programmable Logic  
NoC  
XPIO  
XSIO  
DDR Memory Controller for DDR4, LPDDR4, and LPDDR4X  
DDR Memory Controller for DDR5, LPDDR5, and LPDDR5X  
HBM  
CIPS  
Processing System Wizard  
GT  
HSDP  
High-Speed Connectivity and Encryption Integrated IP  
GPU  
VCU  
ISP

# System Architecture

AMD Versal™ devices are divided into series that are targeted to different applications and markets. Some components are consistent between series and some vary either in availability or features. A few of the major resources include:

- AI Engine
- Programmable Logic (PL)
- Network on chip (NoC)
- High-speed I/O (XPIO) or X5IO for Versal AI Edge Series Gen 2 and Versal Prime Series Gen 2
- DDR Memory Controller for DDR4, LPDDR4, and LPDDR4X
- DDR Memory Controller for DDR5, LPDDR5, and LPDDR5X for Versal AI Edge Series Gen 2 and Versal Prime Series Gen 2
- High bandwidth memory (HBM)
- Control, Interfaces and Processing System (CIPS)
- Processing System Wizard for Versal AI Edge Series Gen 2 and Versal Prime Series Gen 2
- Transceivers (GT)
- High-speed debug port (HSDP)
- High-Speed Connectivity and Encryption Integrated IP
- Graphics processor unit (GPU) for Versal AI Edge Series Gen 2 and Versal Prime Series Gen 2
- Video codec unit (VCU) for Versal AI Edge Series Gen 2 and Versal Prime Series Gen 2
- Image signal processor (ISP) for Versal AI Edge Series Gen 2

## Series Comparison

The following table shows several features that vary between series. If a resource is present in a device listed in the column header, it is of the type listed in the table. Not all devices include all resources, see the *Versal Architecture and Product Data Sheet: Overview* (DS950) for more detail and device-specific information.

**Table: Versal Device by Series**

| Series                 | AI Edge Gen 2                                                                                                                                                                                                                        |              |              | AI Core      |             | Prime Gen 2               |              | Prime       |              | Premium      |             |             | HBM  |
|------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|--------------|--------------|-------------|---------------------------|--------------|-------------|--------------|--------------|-------------|-------------|------|
| Devices                | 2VE3xxx                                                                                                                                                                                                                              | VE1xxx       | VE2xxx       | VC1xxx       | VC2xxx      | 2VM3xxx                   | VM1xxx       | VM2xxx      | VP10xx       | VP1xxx       | VP2xxx      | All         |      |
| AI Engine              | AIE-ML v2                                                                                                                                                                                                                            | AIE          | AIE-ML       | AIE          | AIE-ML      | -                         | -            | -           | -            | -            | AIE         | -           |      |
| Processing System      | There are several different types of processing systems (PS) in the Versal architecture. For detailed information on feature and generation comparison, see the <i>Versal Architecture and Product Data Sheet: Overview</i> (DS950). |              |              |              |             |                           |              |             |              |              |             |             |      |
| GT/ GTYP/GTR           | GTYP/<br>GTR <sup>1</sup>                                                                                                                                                                                                            | GTY          | GTYP         | GTY/<br>GTYP | GTYP        | GTYP/<br>GTR <sup>1</sup> | GTY/<br>GTYP | GTYP        | GTY/<br>GTYP | GTY/<br>GTYP | GTYP        | GTYP        | GTYP |
| GTM                    | -                                                                                                                                                                                                                                    | -            | -            | -            | -           | -                         | -            | 58G/112G    | 112G         | 112G         | 112G        | 112G        |      |
| CPM                    | -                                                                                                                                                                                                                                    | PCIe 4.0 x16 | PCIe 4.0 x16 | PCIe 4.0 x16 | PCIe 5.0 x8 | -                         | PCIe 4.0 x16 | PCIe 5.0 x8 | PCIe 4.0 x4  | PCIe 5.0 x8  | PCIe 5.0 x8 | PCIe 5.0 x8 |      |
| PCIe                   | PCIe 5.0 x4                                                                                                                                                                                                                          | PCIe 4.0 x8  | PCIe 4.0 x8  | PCIe 4.0 x8  | PCIe 5.0 x4 | PCIe 5.0 x4               | PCIe 4.0 x8  | PCIe 5.0 x4 | PCIe 4.0 x8  | PCIe 5.0 x4  | PCIe 5.0 x4 | PCIe 5.0 x4 |      |
| Multirate Ethernet MAC | 100G                                                                                                                                                                                                                                 | 40G          | 100G         | 100G         | 100G        | 100G                      | 100G         | 100G        | 100G         | 100G         | 100G        | 100G        |      |
| 600G Ethernet MAC      | -                                                                                                                                                                                                                                    | -            | -            | -            | -           | -                         | -            | 600G        | 600G         | 600G         | 600G        | 600G        |      |
| 600G Interlaken        | -                                                                                                                                                                                                                                    | -            | -            | -            | -           | -                         | -            | -           | 600G         | 600G         | 600G        | 600G        |      |
| 400G High-Speed Crypto | -                                                                                                                                                                                                                                    | -            | -            | -            | -           | 400G                      | -            | 400G        | 400G         | 400G         | 400G        | 400G        |      |
| HBM                    | -                                                                                                                                                                                                                                    | -            | -            | -            | -           | -                         | -            | -           | -            | -            | -           | -           | Yes  |
| VDU                    | -                                                                                                                                                                                                                                    | -            | Yes          | -            | Yes         | -                         | -            | -           | -            | -            | -           | -           |      |
| GPU                    | Yes                                                                                                                                                                                                                                  | -            | -            | -            | -           | Yes                       | -            | -           | -            | -            | -           | -           |      |
| VCU                    | Yes                                                                                                                                                                                                                                  | -            | -            | -            | -           | Yes                       | -            | -           | -            | -            | -           | -           |      |
| ISP                    | Yes                                                                                                                                                                                                                                  | -            | -            | -            | -           | -                         | -            | -           | -            | -            | -           | -           |      |

1. PS-dedicated transceivers connectivity: GTYP: 10G Ethernet, PCI Express®, and HSDP; GTR: USB and DisplayPort.

## Device Layout

Versal device applications can exploit the capabilities of each of these resources. To create or migrate a design to a Versal device, identify which resources best satisfy the different needs of the application and partition the application across those resources.  
The figure shows the layout of the Versal device, which includes the Versal AI Edge Series Gen 2 and Versal Prime Series Gen 2 blocks.

**Figure: Versal Device Layout**

The following sections provide a summary of the blocks that comprise the Versal architecture. For detailed information on these blocks, see the *Versal Architecture and Product Data Sheet: Overview* ([DS950](#)).

## AI Engine

The Versal AI Core Series, AI Edge Series, and AI Edge Series Gen 2 deliver breakthrough AI inference acceleration with AI Engines. This series is designed for a breadth of applications, including cloud for dynamic workloads and network for massive bandwidth, all while delivering advanced safety and security features. AI/data scientists, software/hardware developers benefit from high compute density to accelerate the performance of any application. Given the AI Engine's advanced signal processing compute capability, it is well-suited for highly optimized wireless applications such as radio, 5G, backhaul, and other high-performance DSP applications.

AI Engines are an array of very-long instruction word (VLIW) processors with single instruction multiple data (SIMD) vector units that are highly optimized for compute-intensive applications, specifically digital signal processing (DSP), 5G wireless applications, and artificial intelligence (AI) technology such as machine learning (ML).

AI Engines are hardened blocks that provide multiple levels of parallelism including instruction-level and data-level parallelism. Instruction-level parallelism includes a scalar operation, up to two moves, two vector reads (loads), one vector write (store), and one vector instruction that can be executed—in total, a 7-way VLIW instruction per clock cycle. Data-level parallelism is achieved via vector-level operations where multiple sets of data can be operated on a per-clock-cycle basis. Each AI Engine contains both a vector and scalar processor, dedicated program memory, local data memory, and can access adjacent local memory in any of three neighboring directions. It also has access to DMA engines and AXI4 interconnect switches to communicate via streams to other AI Engines or to the programmable logic (PL) or the DMA. Refer to the *Versal Adaptive SoC AI Engine Architecture Manual* ([AM009](#)) for specific details on the AI Engine array and interfaces.

The AI Engine-ML (AIE-ML) block is capable of delivering 2x compute throughput compared to its predecessor AI Engine blocks. The AIE-ML block, primarily targeted for machine learning inference applications, delivers one of the industry's best performance per Watt for a wide range of inference applications. Refer to the *Versal Adaptive SoC AIE-ML Architecture Manual* ([AM020](#)) for specific details on the AIE-ML features and architecture.

The AI Engine-ML v2 (AIE-ML v2) block is capable of delivering 2x compute throughput compared to its predecessor AI Engine-ML blocks. The AIE-ML v2 block, primarily targeted for machine learning inference applications, delivers one of the industry's best performance per Watt for a wide range of inference applications. Refer to the *Versal Adaptive SoC AIE-ML v2 Architecture Manual* ([AM027](#)) for specific details on the AIE-ML v2 features and architecture.

As an application developer, it is possible to use one of the white or black box flows for running a ML inference application on AIE-ML variants. With the white box flow you can integrate custom kernels and dataflow graphs in the AIE-ML variants programming environment. A black box flow uses performance optimized Neural Processing Unit (NPU) IP from AMD to accelerate ML workloads in the AIE-ML variants.

AMD Vitis™ AI is used as a front-end tool that parses the network graph, performs optimization, quantization of the graph, and generates compiled code that can be run on the AIE-ML variants hardware. The AIE-ML variants core tile architecture provides support for a variety of precision fixed and floating-point datatypes. The architecture allows for pipe-lined vector processing and incorporates high-density, high-speed on-chip memory that can effectively store on-chip tensors. Additionally, it features versatile datamovers that are adept at handling multi-dimensional tensors in memory. With the proper selection of overlay processor architecture and spatial and temporal distribution of the input/output tensor in the on/off-chip memory, it is possible to achieve high computational efficiency of the AIE-ML variants processing cores.

## Programmable Logic

The Versal adaptive SoC programmable logic (PL) comprises configurable logic blocks (CLBs), internal memory, and DSP engines. Every CLB contains 64 flip-flops and 32 look-up tables (LUTs). Half of the CLB LUTs can be configured as:

- 64-bit RAM, as a 32-bit shift register (SRL32), or
- Two 16-bit shift registers (SRL16)

In addition to the LUTs and flip-flops, the CLB contains the following:

- Carry lookahead logic for implementing arithmetic functions or wide logic functions
- Dedicated, internal connections to create fast LUT cascades without external routing

This enables a flexible carry logic structure that allows a carry chain to start at any bit in the chain. In addition to the distributed RAM (64-bit each) capability in the CLB, there are dedicated blocks for optimally building memory arrays in the design:

### **Accelerator RAM (4 MB)**

Available in some Versal devices only

### **Block RAM (36 Kb each)**

Where each port can be configured as 4Kx9, 2Kx18, 1Kx36, or 512x72 in simple dual-port mode

### **UltraRAM (288 Kb each)**

Where each port can be configured as 32Kx9, 16Kx18, 8Kx36, or 4Kx72

Versal devices also include many low-power DSP Engines, combining high speed with small size while retaining system design flexibility. The DSP engines can be configured in various modes to better match the application needs:

- 27×24-bit twos complement multiplier and a 58-bit accumulator
- Three element vector/INT8 dot product
- Complex 18bx18b multiplier
- Single precision floating point

For more information on PL resources, see the following documents:

- *Versal Adaptive SoC DSP Engine Architecture Manual (AM004)*
- *Versal Adaptive SoC Configurable Logic Block Architecture Manual (AM005)*
- *Versal Adaptive SoC Memory Resources Architecture Manual (AM007)*

## NoC

The network on chip (NoC) is a high-speed communication subsystem. It transfers data between intellectual property (IP) Endpoints in the PL, PS, and other integrated blocks, providing unified intra-die connectivity. The NoC master and slave interfaces can be configured as AXI3, AXI4, or AXI4-Stream. The NoC converts these AXI interfaces to a 128-bit wide NoC packet protocol. The data moves horizontally and vertically across the device via the horizontal NoC (HNoC) and vertical NoC (VNoC), respectively.

The HNoC runs at the bottom and top of the Versal adaptive SoC, close to the I/O banks and integrated blocks (for example, processors, memory controllers, and PCIe). The number of VNoCs (up to 8 VNoCs) depends on the device and the amount of DDR memory controllers (up to 4 DDR memory controllers or 8 DDR memory controllers for Versal AI Edge Series Gen 2 and Versal Prime Series Gen 2).

For Versal devices that use stacked silicon interconnect technology (SSIT), the NoC connects between the super logic regions (SLRs) using the NoC inter-die bridge (NIDB). For more information on the AXI protocol, see the *Vivado Design Suite: AXI Reference Guide (UG1037)*.

You must configure/program the NoC with the NoC programming interface (NPI) at early boot and before the NoC data paths are used. The NPI programs NoC registers that define the routing table, rate modulation, and QoS configuration. Programming of the NoC from the NPI normally requires no user intervention. Programming is fully automated and executed by the platform management controller (PMC)-embedded NPI controller.

For more information about boot and configuration, see the *Versal Adaptive SoC Technical Reference Manual (AM011)* or the *Versal AI Edge Series Gen 2 and Prime Series Gen 2 Technical Reference Manual (AM026)*.

The Versal adaptive SoC NoC IP acts as the logical representation of the Versal adaptive SoC NoC. The main function of the NoC is to efficiently move data between the DDR controllers and the rest of the device. The Versal adaptive SoC NoC IP enables multiple masters to access a shared DDR memory controller with advanced quality of service (QoS) settings. The AXI NoC IP is required to connect the PS or the PL to the DDR memory controller. The AXI NoC IP can also be used to create additional connections between the PS and the PL or between design modules located in the PL.

For more information on the NoC IP and performance, see the *Versal Adaptive SoC Programmable Network on Chip and Integrated Memory Controller LogiCORE IP Product Guide (PG313)* or the *Programmable Network on Chip (NoC2) LogiCORE IP Product Guide (PG406)* and *Integrated DDR5/LPDDR5/5X Memory Controller LogiCORE IP Product Guide (PG456)*.

## XPIO

The XPIO in Versal adaptive SoCs are similar to the high-speed I/O (HPIO) in the AMD UltraScale™ architecture. However, the XPIO are located at the bottom and/or top periphery of the device, unlike the I/O columnar layout in previous devices. The XPIO provide XPHY logic that is similar to UltraScale device native mode. The XPHY logic encapsulates calibrated delays along with serialization and deserialization logic for six single-ended I/O ports known as nibble. Each XPIO bank contains nine XPHY logic sites and supports up to 54 single-ended I/O ports. Integrated DDR memory controller, soft memory controllers, and custom high-performance I/O interfaces use the XPHY logic. For more information on the XPIO, see the *Versal Adaptive SoC SelectIO Resources Architecture Manual (AM010)*.

## X5IO

The X5IO banks are similar to the high-speed I/O (HPIO) in the AMD UltraScale™ architecture. However, the X5IO are located at the bottom periphery of the device, unlike the I/O columnar layout in previous devices. The X5IO provide X5IO PHY that is similar to UltraScale device native mode. The X5IO PHY encapsulates calibrated delays along with serialization and deserialization logic for eight single-ended I/O ports known as octad. Each X5IO bank contains eight octad logic sites and supports up to 64 single-ended I/O ports. Integrated DDR memory controller, soft memory controllers, and custom high-performance I/O interfaces use the X5IO PHY. For more information on the X5IO, see the *Versal Adaptive SoC SelectIO Resources Architecture Manual (AM010)*.

## DDR Memory Controller for DDR4, LPDDR4, and LPDDR4X

The DDR memory controller is a high-efficiency, low-latency integrated DDR memory controller for a variety of applications. This includes general purpose central processing units (CPUs) as well as other traditional field programmable gate array (FPGA) applications, such as video or network buffering.

The controller operates at half the DRAM clock frequency and supports DDR4, LPDDR4, and LPDDR4X standards up to 4266 Mb/s. The controller is configured as a single DDR memory interface with data widths of 16, 32, and 64 bits, plus an extra eight check bits when error-correction code (ECC) is enabled. The controller can also be configured as two independent or interleaved DDR interfaces of 16 or 32 data bits each. The controller supports x4, x8, and x16 DDR4 and x32 LPDDR4 components, small outline dual in-line memory modules (SODIMMs), unbuffered DIMMs (UDIMMs), registered DIMMs (RDIMMs), and load-reduced DIMMs (LRDIMMs).

The DDR memory controller is accessed through the NoC. The optimal combination of memory interfaces with various width, type, and speed can be identified by using the *Versal Adaptive SoC Memory Interface Planning Tutorial* available from the AMD GitHub repository.

In Versal adaptive SoC, the DDR memory controller is a system-wide, shared resource. It is shared between the PS and PL via the device-wide, high-performance NoC interface. You can configure the NoC IP core to include one or more integrated DDR memory controllers. If two or four DDR memory controllers are selected, the DDR memory controllers are grouped to form a single interleaved memory.

In interleaved mode, the application views the participating DDR memory controllers as a single unified block of memory. The NoC supports interleaving across two or four DDR memory controllers. It automatically divides AXI requests into interleaved, block-sized subrequests and alternately sending the subrequests to each of the participating DDR memory controllers.

**!! Important:** You must use the NoC to connect between the PL, PS, CPM, or AI Engine and the DDR memory controller.

For more information on the DDR memory controller, see the *Versal Adaptive SoC Programmable Network on Chip and Integrated Memory Controller LogiCORE IP Product Guide* ([PG313](#)).

 **Note:** Versal adaptive SoC also supports soft memory controllers in the PL fabric, similar to previous device families.

## DDR Memory Controller for DDR5, LPDDR5, and LPDDR5X

The DDR5/LPDDR5/5X memory controller is a high-efficiency, low-latency integrated memory controller for various applications.

The controller operates at half the DRAM clock frequency and supports DDR5, LPDDR5, and LPDDR5X standards up to 8,533 Mb/s. The controller can be configured as a single DDR memory interface with data widths up to 32 bits. Dual channel configurations are also supported at data widths up to 16 bits. The controller supports both sideband and in-line error correction code (ECC) configurations. The controller supports DDR5 and LPDDR5/5X components, and DDR5 dual in-line memory modules (DIMMs). The memory controller includes optional AES-GCM or AES-XTS encryption. A built-in side channel leakage reduction feature is available when using AES-GCM encryption to provide resistance to dynamic power analysis (DPA) or side channel analysis (SCA).

**!! Important:** There are multiple variants of the memory controller. The maximum interface rates and supported features and configurations vary between versions.

For more information on the memory controller, see the *Integrated DDR5/LPDDR5/5X Memory Controller LogiCORE IP Product Guide* ([PG456](#)).

## HBM

The Versal high-bandwidth memory (HBM) controller provides access to one or two stacks depending on the selected device:

- Up to 128 Gb (16 GB) for four high stacked devices
- Up to 256 Gb (32 GB) for eight high stacked devices

Eight independent HBM controllers connect to a single stack. Each of the HBM controller supports two mostly independent pseudo channels, each addressing a dedicated segment of HBM. Each pseudo channel has a 64-bit wide data bus with a shared command/address/control (CAC) bus. The total data-bit width of an HBM stack is 1,024 bits divided across 16 pseudo channels. The controller and PHY operate at up to 1,600 MHz for a data transfer rate of 3,200 MT/s. With 128 bits per HBM controller, eight controllers per stack, and two stacks in most devices, this yields a maximum throughput of 819 GB/s.

The HBM controller interfaces to user logic in PL via the NoC. A pair of NoC slave ports are dedicated to each pseudo channel. An 8x8 switch is added to each quad of HBM pseudo channels. The NoC combined with the switches allow global addressing to the entire HBM stack from any master connected to the NoC. The NoC IP core can be configured to include a subset or all the integrated HBM controllers.

For more information on the HBM controller, see the *Versal Adaptive SoC Programmable Network on Chip and Integrated Memory Controller LogiCORE IP Product Guide* ([PG313](#)).

## CIPS

This section describes the Arm® Cortex®-A72-based processing system in the Versal AI Edge Series, Versal AI Core Series, Versal Prime Series, and Versal Premium Series.

 **Note:** The Arm Cortex-A78AE-based processing system in the Versal AI Edge Series Gen 2 and Versal Prime Series Gen 2 is described in the [Processing System Wizard](#).

**!! Important:** The CIPS IP is applicable to most devices with the Arm Cortex-A72-based processing system. The relevant families are Versal AI Edge Series, Versal AI Core Series, Versal Prime Series, Versal Premium Series, and Versal RF Series.

The VP1902, VM2152, and Versal RF devices have an Arm Cortex-A72-based processing system, but use the PS Wizard IP rather than the CIPS IP.

The PS, PMC, and CPM modules are grouped together and configured using the Control, Interface, and Processing System (CIPS) IP core as shown in the following figure.

 **Note:** The Versal adaptive SoC includes multiple power domains. In the PS, the RPU is in the low-power domain (LPD), the APU is in the full-power domain (FPD), and the platform management controller (PMC) is in the PMC power domain. CPM has two implementations depending on the target device capability:

- CPM4 that is compliant with the PCI Express Base Specification Revision 4.0
- CPM5 that is compliant with the PCI Express Base Specification Revision 5.0

CPM4 is fully powered by the PL domain while CPM5 is powered by its own dedicated supply (VCC\_CPM5) as well as the PS LPD. For more information on the power domains, see the *Versal Adaptive SoC Technical Reference Manual* ([AM011](#)).

**Figure: Device-Level Interconnect Architecture**



## Processing System

The processing system (PS) contains the application processing unit (APU), real-time processing unit (RPU), and peripherals. The PS and PL share the DDR memory controller via the device-wide NoC interface.

### APU

The application processing unit (APU) includes a dual-core Arm® Cortex®-A72 processor attached to a 1 MB unified L2 cache. The APU is designed for system control and compute-intensive applications that do not need real-time performance. The increased performance of Versal adaptive SoC requires higher performance from the memory subsystem. To help meet these requirements, the Versal adaptive SoC includes an increased L1 instruction cache size (32 KB to 48 KB) as well as multiple DDR memory controllers and the NoC, which improve the performance of the main memory.

The following table shows the difference between the Cortex-A53 in AMD Zynq™ UltraScale+™ MPSoCs and the Cortex-A72 processors in Versal adaptive SoCs.

**Table: Cortex-A53 and Cortex-A72 Comparison**

| Cortex-A53                                         | Cortex-A72         | Versal Adaptive SoC Benefits                        |
|----------------------------------------------------|--------------------|-----------------------------------------------------|
| Armv8A architecture (64-bit and 32-bit operations) |                    | No application code changes required                |
| EL0-EL3 exception levels                           |                    |                                                     |
| Arm TrustZone (Secure/non-secure operation)        |                    |                                                     |
| Advanced SIMD NEON floating-point unit             |                    |                                                     |
| Integrated memory manager                          |                    |                                                     |
| Power island control                               |                    |                                                     |
| Up to 1500 MHz                                     | Up to 1700 MHz     | Higher frequency                                    |
| 2.23 DMIPS per MHz                                 | 5.74 DMIPS per MHz | 2 times higher raw performance (per Arm benchmarks) |
| 3.65 SPEC2006int                                   | 6.84 SPEC2006int   |                                                     |
| 2-way super scalar                                 | 3-way super scalar | More efficient instruction cycle                    |

| Cortex-A53                    | Cortex-A72                  | Versal Adaptive SoC Benefits               |
|-------------------------------|-----------------------------|--------------------------------------------|
| In-order execution            | Out-of-order execution      | Higher performance and fewer memory stalls |
| Power efficient               | Improved power efficiency   | 20% lower power                            |
| 8-stage pipeline              | 15-stage pipeline           | More instructions queued and executed      |
| Conditional branch prediction | Two-level branch prediction | Higher cache hits and less memory fetches  |

## RPU

The real-time processing unit (RPU) Arm Cortex-R5F processor has faster clocking frequencies than the Zynq UltraScale+ MPSoC. The Versal Arm Cortex-R5F processor supports Vector Floating-Point v3 (VFPv3) whereas the Zynq UltraScale+ MPSoC Arm Cortex-R5F processor supports VFPv2.

## Standard Peripherals

Versal adaptive SoC standard I/O peripherals are located in the low-power domain (LPD) and in the PMC. The NoC must be configured to provide access to the DDR memory controller so that the peripherals with direct memory access (DMA) can access the DDR memory interfaces.

The following table shows the difference between the standard peripherals in Zynq UltraScale+ MPSoCs and Versal adaptive SoCs.

**Table: Standard Peripherals Comparison**

| Peripheral                                  | Zynq UltraScale+ MPSoC           | Versal Adaptive SoC                                                             |
|---------------------------------------------|----------------------------------|---------------------------------------------------------------------------------|
| <b>CAN, CAN-FD</b>                          | 2 controllers with standard CAN  | 2 controllers with controller area network - flexible data rates (CAN-FD)       |
| <b>GEM</b>                                  | 4 controllers                    | 2 controllers with time-sensitive networking (TSN) feature                      |
| <b>GPIO</b>                                 | 1 controller                     | 2 controllers                                                                   |
| <b>I2C</b>                                  | 2 controllers                    | 2 controllers in LPD (general purpose)<br>1 controller in PMC (general purpose) |
| <b>NAND</b>                                 | 1 controller                     | N/A                                                                             |
| <b>PCIe 1.0 and 2.0</b>                     | 1 controller                     | N/A                                                                             |
| <b>PCIe 3.0 and 4.0</b>                     | 1 controller                     | Varies by device                                                                |
| <b>SPI</b>                                  | 2 controllers                    | 2 controllers                                                                   |
| <b>SATA</b>                                 | 1 controller                     | N/A                                                                             |
| <b>UART</b>                                 | 2 controllers with standard UART | 2 controllers with Server Base System Architecture (SBSA)                       |
| <b>USB (host, device, dual-role device)</b> | 2 USB 2.0/3.0 controllers        | 1 USB 2.0 controller                                                            |

## AMBA Specification Interfaces

The PS-PL Arm Advanced Microcontroller Bus Architecture (AMBA) specification interfaces in the Versal adaptive SoC have similar functionality to Zynq UltraScale+ MPSoCs, as shown in the following table.

**Note:** Enabling and disabling the different power domains in the LPD, FPD, and PL enables and disables the AXI connections to those domains.

**Important:** Because the DDR memory controller is shared between the PS and PL via the device-wide, high-performance NoC interface, there are fewer PS-PL AXI interconnects.

**Table: AMBA Interface Comparison**

| PS-PL AMBA Interface             | Master | Coherency | Zynq UltraScale+ MPSoC |       | Versal Adaptive SoC |       |
|----------------------------------|--------|-----------|------------------------|-------|---------------------|-------|
|                                  |        |           | Name                   | Count | Name                | Count |
| Accelerator Coherency Port (ACP) | PL     | I/O       | S_AXI_ACP_FPD          | 1     | S_ACP_FPD           | 1     |
| AXI Coherency Extensions (ACE)   | PL     | 2-way     | S_AXI_ACE_FPD          | 1     | S_ACE_FPD           | 1     |
| PL-to-FPD AXI                    | PL     | -         | S_AXI_HPx_FPD          | 4     | S_AXI_HP            | 1     |
| PL-to-FPD AXI                    | PL     | I/O       | S_AXI_HPCx_FPD         | 2     | S_AXI_HPC           | 1     |
| PL-to-LPD AXI                    | PL     | -         | S_AXI_LPD              | 1     | S_AXI_LPD           | 1     |
| FPD-to-PL AXI                    | FPD    | -         | M_AXI_HPMx_FPD         | 2     | M_AXI_FPD           | 1     |
| LPD-to-PL AXI                    | LPD    | -         | M_AXI_HPM0_LPD         | 1     | M_AXI_LPD           | 1     |

## PMC

The platform management controller (PMC) subsystem includes the following functions:

- Boot and configuration management
- Dynamic Function eXchange (DFX)
- Power management
- Reliability and safety functions
- Life-cycle management, including device integrity, debug, and system monitoring
- I/O peripherals

The PMC block executes the BootROM and platform loader and manager (PLM) to handle the boot and configuration for the processing system, CPM, PL, NoC register initialization and settings, and I/O and interrupt configuration settings. In addition to boot and configuration, the PLM provides life-cycle management services. The PMC bus architecture and centralized integration enables significantly faster configuration and readback performance when compared with previous devices. The following table shows the Zynq UltraScale+ MPSoC blocks that are comparable to the Versal adaptive SoC blocks.

**Table: Block Comparison**

| Zynq UltraScale+ MPSoC                                               | Versal Adaptive SoC            |
|----------------------------------------------------------------------|--------------------------------|
| Configuration security unit (CSU) and platform management unit (PMU) | PMC                            |
| CSU                                                                  | ROM code unit (RCU)            |
| PMU                                                                  | Platform processing unit (PPU) |
| First stage boot loader (FSBL) and PMU firmware                      | PLM                            |

For more information on the PMC, see the *Versal Adaptive SoC Technical Reference Manual* ([AM011](#)). For more information on the PLM, see the *Versal Adaptive SoC System Software Developers Guide* ([UG1304](#)).

#### Flash Memory Controllers

The PMC includes three types of flash memory controllers. Each type of memory controller supports device boot. Where there are multiple instances of a type of memory controller, only some instances can support boot. The following table shows the difference between the flash memory controllers in Zynq UltraScale+ MPSoCs and Versal adaptive SoCs.

**Table: Flash Memory Controllers Comparison**

| Peripheral              | Zynq UltraScale+ MPSoC           | Versal Adaptive SoC                                                          |
|-------------------------|----------------------------------|------------------------------------------------------------------------------|
| <b>Octal SPI (OSPI)</b> | N/A                              | 1 controller                                                                 |
| <b>Quad SPI (QSPI)</b>  | 1 controller                     | 1 controller that does not support linear address mode                       |
| <b>SD/eMMC</b>          | 2 controllers (SD 3.0/eMMC 4.51) | 2 controllers (SD 3.0/eMMC 4.51) with the same functionality and updated DLL |

**Note:** Versal adaptive SoCs can also support secondary boot modes (e.g., Ethernet, USB, etc.). See the device technical reference manual for supported flash memory controllers and versions. For more information, see the *Versal Adaptive SoC System Software Developers Guide* ([UG1304](#)).

#### CPM

The Versal architecture includes several blocks for implementation of high performance, standards-based interfaces built on PCI® -SIG technologies. In Versal adaptive SoCs that contain a CPM, the CPM provides the primary interfaces for designs following the server system methodology. As part of the Versal architecture integrated shell, the CPM has dedicated connections to the NoC where it can access DDR and other hardened IP. The CPM configures separately from the programmable logic, which enables the integrated shell to become operational quickly after boot without the need to configure the PL. This separate configuration addresses a common power-up and reset timing challenge imposed by the PCIe specification. Two implementations of the CPM exist: CPM4 and CPM5.

For CPM4, the block is compliant with the PCIe Base Specification Revision 4.0 and capable of supporting defined line rates up to the maximum of 16 GT/s. CPM4 contains two PCIe controllers with shared access to 16 GTY transceivers, and integrates a single direct memory access (DMA) controller functionality (either QDMA or XDMA that is user selectable) associated with CPM PCIe Controller #0. Cache Coherent Interconnect for Accelerators (CCIX) support in CPM4 complies with CCIX Base Specification Revision 1.0.

For CPM5, the block is compliant with the PCIe Base Specification Revision 5.0 and capable of supporting defined line rates up to the maximum of 32 GT/s. CPM5 contains two PCIe controllers with dedicated access to 16 GTYP transceivers. CPM5 integrates two DMA controllers (both QDMA) each associated with CPM PCIe Controller #0 and CPM PCIe Controller #1. CCIX support in CPM5 complies with CCIX Base Specification Revision 1.1.

CPM4 and CPM5 include the following additional components:

- The coherent mesh network (CMN) forms the CCIX block, which is based on the Arm CoreLink CMN-600.
- There are two Coherent Hub Interface (CHI) PL interface (CPI) blocks. CPM4 has one L2 cache instance, and CPM5 has two L2 cache instances. CPI blocks interface with the accelerators in the PL and perform 512-to-256 bit data width conversion and clock domain crossing into the internal core clock.
- The non-coherent interconnect block, which interfaces with the PS for access to the NoC and DDR memory controller. The interconnect is connected to all of the other sub-blocks via an advanced peripheral bus (APB) or AXI slave interface for configuration.
- A clock/reset block, which includes a phase-locked loop (PLL) and clock dividers.

CPM availability is device specific. For information, see the following documents:

- *Versal Architecture and Product Data Sheet: Overview* ([DS950](#))
- *Versal Adaptive SoC CPM CCIX Architecture Manual* ([AM016](#))
- *Versal Adaptive SoC CPM Mode for PCI Express Product Guide* ([PG346](#))
- *Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide* ([PG347](#))

**Note:** Versal adaptive SoC also supports implementation of subsystems based on PCI-SIG technologies in the PL fabric, similar to previous device families.

## Processing System Wizard

This section describes the Arm Cortex-A78AE-based processing system in the Versal AI Edge Series Gen 2 and Versal Prime Series Gen 2.

**Note:** The Arm Cortex-A72-based processing system in the Versal AI Edge Series, Versal AI Core Series, Versal Prime Series, and Versal Premium Series is described in the [CIPS](#).

For Versal AI Edge Series Gen 2 and Versal Prime Series Gen 2, the processing system (PS), PMC, and ASU modules are grouped together and configured using the Processing System Wizard IP core as shown in the following figure. For more information on the PS Wizard, see the *Versal AI Edge Series Gen 2 and Prime Series Gen 2 Processing System Wizard IP Product Guide (PG450)*. For more information on the power domains, see the *Versal AI Edge Series Gen 2 and Prime Series Gen 2 Technical Reference Manual (AM026)*.

**Note:** The Versal AI Edge Series Gen 2 and Versal Prime Series Gen 2 have up to 8x A78AEs and 10x R52s. The 2VM3654 device in the Versal Prime Series Gen 2, cache, cluster configuration, and other details can vary from the content below. For more detail and device-specific information, see the *Versal Architecture and Product Data Sheet: Overview (DS950)*.

**Figure: Device-Level Interconnect Architecture for Versal Devices**



### Processing System

The processing system (PS) contains the application processing unit (APU), real-time processing unit (RPU), and peripherals. The DDR memory controller is shared between the PS and PL via the device-wide NoC interface.

#### APU

The application processing unit (APU) is a feature-rich eight-core, two cluster unit based on Arm Cortex®-A78AE processors. The A78AE cores can operate split or in lock-step. Each core has: 64 KB level 1 instruction/data cache and 512 KB unified level 2 cache. Each cluster of four processors has 1 MB unified level 3 cache.

The APU communicates with the rest of the processing system via the coherent hub interface (CHI) based coherency interconnect (CMN-600 w/AE). The coherent interconnect enables the processing system to satisfy safety requirements, provides snoopable LLC, enables efficient L3 stashing, and provides sufficient bandwidth and QoS for incoming traffic and traffic to the NoC.

The following table shows the difference between the Cortex-A53 in AMD Zynq™ UltraScale+™ MPSoCs, Cortex-A72, and the Cortex-A78AE processors in the processing system of the Versal devices.

**Table: Cortex-A53, Cortex-A72, and Cortex-A78AE Comparison**

| Cortex-A53                                         | Cortex-A72                       | Cortex-A78AE                                     |
|----------------------------------------------------|----------------------------------|--------------------------------------------------|
| Armv8A architecture (64-bit and 32-bit operations) |                                  |                                                  |
| EL0-EL3 exception levels                           |                                  |                                                  |
| Advanced SIMD NEON floating-point unit             |                                  |                                                  |
| Integrated memory manager                          |                                  |                                                  |
| Power island control                               |                                  |                                                  |
| Up to 1500 MHz                                     | Up to 1700 MHz                   | Up to 2400 MHz                                   |
| 3.13 DMIPS per MHz per processor                   | 5.74 DMIPS per MHz per processor | 11.38 DMIPS per MHz per processor                |
| 1 Quad core cluster                                | Dual core cluster                | 4-Dual core clusters                             |
| 32 KB L1 instruction cache                         | 48 KB L1 instruction cache       | 64 KB L1 instruction cache                       |
| 32 KB L1 data cache                                | 32 KB L1 data cache              | 64 KB L1 data cache                              |
| 512 KB unified L2 cache                            | 512 KB L2 cache per processor    | 512 KB L2 cache per processor                    |
| N/A                                                | N/A                              | 1 MB L3 cache per cluster <sup>1</sup>           |
| N/A                                                | N/A                              | 4 MB unified last-level cache (LLC) <sup>2</sup> |

RPU

The real-time processing unit (RPU) in the processing system contains up to 10 core Cortex®-R52 real-time processor. Each of the Cortex-R52 cores has 32 KB of level 1 instruction and data cache with ECC protection. Each of the Cortex-R52 cores also has a 128 KB tightly-coupled memory (TCM) interface for real-time single cycle access. To provide high-level safety, the Cortex-R52 cores are configurable as split-lock (split or lock-step). The cores are organized into independent, dual-core clusters.

The RPU communicates with the rest of the processing system via the low-power domain (LPD), non-coherent interconnect. The on-chip memory (OCM) is also connected to the LPD interconnect. OCM organizes into two banks of 2 MB (except for 2VM3654, it is 1 MB OCM). Each bank can be accessed through a dedicated 128-bit AXI interface via the LPD interconnect.

**Table: Cortex-R5F and Cortex-R52 Comparison**

| Cortex-R5F                                      | Cortex-R52                                          |
|-------------------------------------------------|-----------------------------------------------------|
| Armv7-R architecture (32-bit operations)        | Armv8-R architecture (64-bit and 32-bit operations) |
| Armv7 exceptions                                | EL0-EL3 exception levels                            |
| Vector Floating Point                           | Vector Floating Point                               |
| Up to 600 MHz                                   | Up to 1050 MHz                                      |
| 1.91 DMIPS per MHz per processor                | 2.72 DMIPS per MHz per processor                    |
| 1 Dual-core Cluster (2 cores)                   | Up to 5 dual-core clusters (up to 10 cores)         |
| 32 KB L1 instruction cache per processor        |                                                     |
| 32 KB L1 data cache per processor               |                                                     |
| 128K Tightly Coupled Memory (TCM) per processor |                                                     |
| Split Mode                                      |                                                     |
| Dual Lock Step Mode                             | Dual Lock Step per cluster                          |

## Connectivity Peripherals

In the processing system, many peripherals connect to external devices over industry-standard protocols, including CAN-FD, SPI, USB, Ethernet, I<sub>2</sub>C, and UART. Many of the peripherals support clock gating and power gating modes to reduce dynamic and static power consumption. These peripherals use multiplexed I/O (MIO) to connect to the external components, or if required, they can also be routed into and through the PL using the extended multiplexed I/O (EMIO).

**Table: Connectivity Comparison**

|                         | <b>Versal AI Edge Series, Versal AI Core Series, Versal</b> | <b>Versal ISAI Edge &amp; Versal IP Gemini2na Sub-Nodes</b>                                             | <b>Versal IP Gemini2n</b> |
|-------------------------|-------------------------------------------------------------|---------------------------------------------------------------------------------------------------------|---------------------------|
| High-Speed Connectivity | Ethernet (x2)                                               | PCIe 5.0 x4 (x1); USB 3.2 (x1); DisplayPort 1.4 (x1); 10G Ethernet (x1); 1G Ethernet (x1); UFS 3.1 (x1) |                           |

|                      | <b>Versal AI Edge Series, Versal AI Core Series, Versal Prime Series Gen 2 and Prime Series Gen 2</b> | <b>Versal UltraScale+ MPSoCs</b>                                       |
|----------------------|-------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------|
| General Connectivity | UART (x2); CAN-FD (x2); USB 2.0 (x1); SPI (x2); I2C (x2)                                              | CAN/CAN-FD (x2); SPI (x2); UART (x2); USB 2.0 (x2); I2C/I3C (x2); GPIO |

#### Standard Peripherals

Versal adaptive SoC standard I/O peripherals are located in the low-power domain (LPD) and in the PMC. The NoC must be configured to provide access to the DDR memory controller so that the peripherals with direct memory access (DMA) can access the DDR memory interfaces.

The following table shows the difference between the standard peripherals in Zynq UltraScale+ MPSoCs and Versal adaptive SoCs.

**Table: Standard Peripherals Comparison**

| Peripheral                                  | Zynq UltraScale+ MPSoC           | Versal AI Edge Series Gen 2 and Versal Prime Series Gen 2                 |
|---------------------------------------------|----------------------------------|---------------------------------------------------------------------------|
| <b>CAN, CAN-FD</b>                          | 2 controllers with standard CAN  | 2 controllers with controller area network - flexible data rates (CAN-FD) |
| <b>GEM</b>                                  | 4 controllers                    | 2 controllers with time-sensitive networking (TSN) feature                |
| <b>GPIO</b>                                 | 1 controller                     | 1 controller                                                              |
| <b>I2C</b>                                  | 2 controllers                    | 2 controllers                                                             |
| <b>I3C</b>                                  | N/A                              | 2 MIPI SenseWire controllers                                              |
| <b>NAND</b>                                 | 1 controller                     | N/A                                                                       |
| <b>PCIe 2.0</b>                             | 1 controller                     | N/A                                                                       |
| <b>PCIe 5.0</b>                             | N/A                              | 1 or 2 controllers                                                        |
| <b>SPI</b>                                  | 2 controllers                    | 2 controllers                                                             |
| <b>SATA</b>                                 | 1 controller                     | N/A                                                                       |
| <b>UART</b>                                 | 2 controllers with standard UART | 2 controllers with Server Base System Architecture (SBSA)                 |
| <b>USB (host, device, dual-role device)</b> | 2 USB 2.0/3.0 controllers        | 2 USB 2.0 controller                                                      |

#### AMBA Specification Interfaces

The PS-PL Arm Advanced Microcontroller Bus Architecture (AMBA) specification interfaces in the Versal adaptive SoC have similar functionality to Zynq UltraScale+ MPSoCs, as shown in the following table.

**Note:** Enabling and disabling the different power domains in the LPD, FPD, and PL enables and disables the AXI connections to those domains.

**Important:** Because the DDR memory controller is shared between the PS and PL via the device-wide, high-performance NoC interface, there are fewer PS-PL AXI interconnects.

**Table: AMBA Interface Comparison**

| PS-PL AMBA Interface             | Master | Coherency | Zynq UltraScale+ MPSoC |       | Versal AI Edge Series Gen 2 and Versal Prime Series Gen 2 |       |
|----------------------------------|--------|-----------|------------------------|-------|-----------------------------------------------------------|-------|
|                                  |        |           | Name                   | Count | Name                                                      | Count |
| Accelerator Coherency Port (ACP) | PL     | I/O       | S_AXI_ACP_FPD          | 1     | PL_ACP_APU                                                | 1     |
| AXI Coherency Extensions (ACE)   | PL     | 2-way     | S_AXI_ACE_FPD          | 1     | -                                                         | -     |
| Coherent Hub Interface (CHI)     | PL     | 2-way     | -                      | -     | PL_CHI_FPD                                                | 1     |
| PL-to-FPD AXI                    | PL     | -         | S_AXI_HPx_FPD          | 4     | FPD_AXI_PL                                                | 1     |
| PL-to-FPD AXI                    | PL     | I/O       | S_AXI_HPCx_FPD         | 2     | PL_ACCELITE_FPDx                                          | 4     |
| PL-to-LPD AXI                    | PL     | -         | S_AXI_LPD              | 1     | PL_AXI_LPD                                                | 1     |
| FPD-to-PL AXI                    | FPD    | -         | M_AXI_HPMx_FPD         | 2     | FPD_AXI_PL                                                | 1     |
| LPD-to-PL AXI                    | LPD    | -         | M_AXI_HPM0_LPD         | 1     | LPD_AXI_PL                                                | 1     |

#### PMC

The PMC includes a ROM code unit (RCU) processor, the platform processing unit (PPU) that runs the platform loader and manager (PLM) firmware, the boot interfaces, and the voltage/temperature system monitor (SYSMON).

For more information on the PMC, see the *Versal AI Edge Series Gen 2 and Prime Series Gen 2 Technical Reference Manual (AM026)*.

#### Flash Memory Controllers

The PMC includes up to four types of flash memory controllers. Each type of memory controller supports device boot. Where there are multiple instances of a type of memory controller, only some instances can support boot. The following table shows the difference between the flash memory controllers in Zynq UltraScale+ MPSoCs and Versal adaptive SoCs.

**Table: Flash Memory Controllers Comparison**

| Peripheral              | Zynq UltraScale+ MPSoC           | Versal AI Edge Series Gen 2 and Versal Prime Series Gen 2 |
|-------------------------|----------------------------------|-----------------------------------------------------------|
| <b>Octal SPI (OSPI)</b> | N/A                              | 1 controller                                              |
| <b>Quad SPI (QSPI)</b>  | 1 controller                     | 1 controller that does not support linear address mode    |
| <b>SD/eMMC</b>          | 2 controllers (SD 3.0/eMMC 4.51) | 2 controllers (SD 3.0/eMMC 4.51 or 5.1)                   |
| <b>UFS</b>              | N/A                              | 1 UFS 3.1 or 3.2 controller                               |

>Note: Versal adaptive SoCs can also support secondary boot modes (e.g., Ethernet, USB, etc.). See the device technical reference manual for supported flash memory controllers and versions.

## ASU

The application security unit (ASU) subsystem includes a RISC-V processor with AES, ECC, SHA2, SHA3, SHAKE, TRNG, and RSA crypto accelerators. It also includes support for a key vault and an interface to the PL for extending crypto functionality. AMD provides firmware drivers to support these features. The ASU accelerates crypto operations and acts as the key storage unit for runtime applications running in the APU, RPU, or processors instantiated within the PL. The ASU firmware is loaded during the boot process by the PLM firmware. This is done in a secure manner with authentication and/or decryption. The ASU is located in the LPD. For more information on the ASU, see the *Versal AI Edge Series Gen 2 and Prime Series Gen 2 Technical Reference Manual* ([AM026](#)).

## GT

GTs provide several protocols for high-speed interfaces, such as Ethernet and Aurora IP. Versal adaptive SoC features the XPIPE mechanism to connect the PCIe block to the GT at high speed. XPIPE and GTs are shared between PL-based IP and PS-based IP (for example, CPM4, Ethernet, Aurora link for debug, etc.). For Versal adaptive SoC, GT components are updated from Common/Channel to a quad granularity.

For more information on GTs, see the following documents:

- *Versal Adaptive SoC Transceivers Wizard LogiCORE IP Product Guide* ([PG331](#))
- *Versal Adaptive SoC GTY and GTYP Transceivers Architecture Manual* ([AM002](#))
- *Versal Adaptive SoC GTM Transceivers Architecture Manual* ([AM017](#))

For guidance on GT selection and pin planning for CPM5, see this [link](#) in the *Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide* ([PG347](#)).

## HSDP

The heterogeneous nature and performance of the Versal adaptive SoC necessitates a system-level high-bandwidth debug and trace solution. The high-speed debug port (HSDP) is a new feature in Versal adaptive SoC that provides unified, at-speed debugging and tracing of the various integrated, fabric-based, and processor blocks in the device under test (DUT). HSDP provides the option of performing debug and trace capture through a dedicated Aurora interface and a high-speed debug cable like SmartLynq+. High-speed debug over PCIe is also available for remote systems that are connected to a host through PCIe. HSDP functions are accessed via high-speed GT-based interfaces, such as the integrated Aurora interface in the PS block or the PCIe interface in the CPM block.

For more information, see the following resources:

- This [link](#) in the *Versal Adaptive SoC Technical Reference Manual* ([AM011](#))
- *Versal AI Edge Series Gen 2 and Prime Series Gen 2 Technical Reference Manual* ([AM026](#))
- *SmartLynq+ Module User Guide* ([UG1514](#))
- *System Design Example for High-Speed Debug Port (HSDP) with SmartLynq+ Module*

## High-Speed Connectivity and Encryption Integrated IP

### MRMAC

The Versal adaptive SoC Multirate Ethernet MAC (MRMAC) provides high-performance, low latency Ethernet ports supporting a wide range of customization and statistics gathering. The MRMAC supports the following forward error corrections (FECs) defined and required by IEEE standards: Clause 91 RS(528, 514) KR4 FEC for 25/50/100GE NRZ support, Clause 91 RS(544, 514) KP4 FEC for 50/100GE PAM4 support, and Clause 74 FEC, for 10/25/40/50GE low-latency support. The MRMAC has a rich set of bypass modes to enable access to FEC-only mode (for custom protocols) and FEC+PCS (for protocol testers). For more information, see the *Versal Devices Integrated 100G Multirate Ethernet MAC (MRMAC) LogiCORE IP Product Guide* ([PG314](#)).

**Note:** MRMAC availability is device specific.

### DCMAC

The Versal adaptive SoC 600G Channelized Multirate Ethernet Subsystem (DCMAC subsystem) is a high-performance, adaptable, Ethernet-integrated hard IP, targeting numerous customer networking applications. The block configures up to six ports. It has independent MAC and PHY functions at the IEEE Standard MAC Rates from 100GE to 400GE, and an overall maximum bandwidth of 600 Gb/s.

The IP supports various FECs and IEEE 1588 Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems (IEEE 1588) hardware timestamping. In addition, the IP can configure to provide 600 Gb/s of MAC processing for up to 40 channels of user-defined bandwidth. For more information, see the *Versal Adaptive SoC 600G Channelized Multirate Ethernet Subsystem (DCMAC) Product Guide* ([PG369](#)).

### Interlaken

The Versal adaptive SoC Integrated 600G Interlaken with FEC (ILKNF) is a high-performance, adaptable Interlaken integrated hard IP, targeting numerous customer networking applications. The block can be configured as a single Interlaken port with bandwidth up to 600 Gb/s supporting a wide variety of lane count and lane rate configurations. The IP supports FEC for use with Interlaken over high-speed transceiver lanes. In addition, the FEC logic can be used without Interlaken, allowing the core to support any number of protocols, including Ethernet. For more information, see the *Versal Adaptive SoC 600G Interlaken LogiCORE IP Product Guide* ([PG371](#)).

### High-Speed Cryptography

The Versal adaptive SoC Integrated 400G High Speed Channelized Cryptography Engine Subsystem (HSC Subsystem) is a high-performance, adaptable encryption integrated hard IP, targeting numerous customer encryption applications. The block has up to four ports, with rates from 100 Gb/s to 400 Gb/s, and an overall maximum bandwidth of 400 Gb/s. The IP supports MACsec, IPsec, and a Bulk Method of encryption. In addition, the IP can be configured to provide 400 Gb/s of encryption processing for up to 40 channels of user-defined bandwidth. For more information, see the *Versal Adaptive SoC 400G High Speed Channelized Cryptography Engine Subsystem LogiCORE IP Product Guide* ([PG372](#)).

## GPU

Versal AI Edge Series Gen 2 and Versal Prime Series Gen 2 contain an Arm Mali™ -G78AE graphics processor unit (GPU) that uses a unified shader core architecture. The single shader processor core type can execute all types of shader code including vertex shaders, fragment shaders, and compute kernels. All cores have access to a shared L2 cache to reduce wasted memory bandwidth due to repeated data fetches. You can configure the GPU as a single partition with four shader cores or two partitions with two shader cores each.

Supported pixel formats include RGB 8/10/16 bit in a variety of container formats, YUV In 8/10/16 bit, and YUV Out 8/10 bit. Adaptive scalable texture compression (ASTC) supports low dynamic range (LDR) and high dynamic range (HDR), enabling support for both 2D and 3D images. Arm frame buffer compression (AFBC) v1.3 supports 4x4 pixel block size.

## VCU

The video codec unit (VCU) in some Versal AI Edge Series Gen 2 and Versal Prime Series Gen 2 devices provides HEVC/AVC encoding and decoding. Each video encoder instance supports up to 4K60 4:4:4 12-bit single stream. Each video decoder engine supports up to 4K60. The VCU can simultaneously encode and decode up to 32 streams with a maximum aggregated bandwidth of 3840x2160 @ 60 fps.

The VCU supports: H.264, H.265, and JPEG (decode) standards; 4:2:0, 4:2:2, 4:4:4, and monochrome formats; and 8-, 10-, and 12-bit depths. DCI 4K (4096x2160) @ 60 fps is supported. The encoder supports DCI 4K60 with 950 MHz, and the decoder supports DCI4K60 with 918 MHz.

For more information on the VCU, see the *Versal AI Edge Gen 2 H.264/H.265/JPEG Video Codec Unit 2 Solutions LogiCORE IP Product Guide* ([PG447](#)).

## ISP

The image signal processor (ISP) contains one to three ISP tiles for preprocessing raw image sensor data. Each ISP tile supports a maximum pixel rate of 600 megapixels per second with a maximum horizontal or vertical resolution of 4096 pixels. Linear and compressed formats support Input pixel depth.

ISP tiles are compatible with standard Bayer input (RGGB, GRBG, BGGR), monochrome (CCCC), RYYC, RCCG, RCCC, and RGB-IR sensor types. An AXI4-Stream interface accepts streaming live data from a MIPI CSI-2 interface. ISP tiles also accept memory-input data from DMA read functionality and input test patterns from an in-built test pattern generator (TPG).

AXI4-Stream for live out and AXI4-memory mapped interfaces for memory out support the following video output formats:

- YUV 4:2:0
- YUV 4:2:2
- Y only
- 8 or 10 bits per component
- RGB888

Each ISP has dual output capability enabling primary output and secondary output with separate controls. One input stream can be processed by a single ISP tile for different primary and secondary output streams. RGB-IR image sensor data can be processed to provide RGB data on the primary output and IR data on the secondary output. In the memory out I/O type, both primary and secondary output DMA support raster half-DWORD aligned frame buffer format suitable for 10-bit max color depth.

For more information on the ISP, see the *Versal AI Edge Series Gen 2 Image Signal Processor (ISP) Product Guide* ([PG432](#)).