

# Architecture of reconfigurable systems : SoC

J. Lorandel

[jordane.lorandel@u-cergy.fr](mailto:jordane.lorandel@u-cergy.fr)

Slides available there: [https://perso-etis.ensea.fr//lorandel/M2\\_ESI\\_ASR.php](https://perso-etis.ensea.fr//lorandel/M2_ESI_ASR.php)

## □Outline

- Design issues
- SoC Design & Intellectual Properties
- Interconnect
  - Interconnect topologies
  - Bus
  - Crossbar
  - NoC
- Embedded Bus and Standards : ARM AMBA & AXI
- SoC FPGA : New tools for Co-design
- TP

# Design issues

## ❑ SoCs ?

- Integrated Circuit that integrates multiple components of a system onto a single chip -> **System on Chip / System on Programmable Chip**



## ❑ MPSoCs – Multi-Processors SoC

- Addresses performance requirements

|                      | Processor on a Chip | SOC           |
|----------------------|---------------------|---------------|
| Area used by storage | 80% cache           | 50% ROM/RAM   |
| Clock frequency      | 3.5 GHz             | 0.5 GHz       |
| Power                | ≥50 W               | ≤10 W         |
| Memory               | ≥1-GB DRAM          | Mostly on-die |

## ❑ Evolution : From boards to SoC

- Emerging new technologies
  - Greater complexity
  - Increased Performance
  - Higher density
  - Lower power dissipation



Intel 4004  
2300 transistors



Intel 80386  
275k transistors



Pentium (1993)  
3,1M transistors



Pentium IV  
[42-184]M transistors



Core i7  
[731M et + (>Billion)  
transistors

## □ Samsung S5PC100 SoC (iPhone 3GS)



- A 32-bit ARM Cortex A8 RISC processor (833MHz)
- Includes powerful hardware accelerators (video, display control, JPEG codec...)
- Includes many HW peripheral such as ADC, SPI, Timers, LCD controller,...
- Various functionalities : Wireless communication, Navigation, Camera, Portable gaming, ...
- Used in iPhone 3GS & iPod touch 3rd generation



## □ Inside a SoC

- Today, designing a SoC consists in the integration of several component (of a system) onto a *single chip*.
  - Embedded processor(s),
  - On-chip interconnection (busses, network, etc)
  - Hardware accelerators
  - Controller of peripherals
  - I/Os interface
  - Analog circuits,
  - ASICs logics
  - Software – OS, Application



### □ Design Challenges of SoCs

1. Ensure a high-level of **performance**
2. **Used Area** = Cost of the integrated circuit
3. **Power consumption**
4. **Reliability**
5. **Configurability** : *standardization* for the manufacture and *customisation* for every application

**SoC design is application-specific !**  
**=> Cost-performance tradeoff**

# Design issues

## ❑ Examples of SoCs

### Xbox One:

- Performance
- Area
- Power Consumption
- Reliability
- Configurability



Xbox One



Apple A 10 Soc



iPhone 7 plus :

- Performance
- Area
- Power consumption
- Reliability
- Configurability

# SoC Design & Intellectual Properties (IP)

## ☐ Software/Hardware

### Classical approach



### Co-design



⇒ Today, only a single HW/SW team for the entire design

### ❑ SoC Design and Interconnect

- The most important part of SoC design is **IP integration**
  - SW/HW IP Cores
  - Maximize design reuse to lower the cost
  - The good choice of interconnect
- Two main **interconnect architectures**
  - Busses based interconnection
  - Network on chip (NoC)

## ❑ Different types of IP for SoC Design

- **Soft IP**

- Hardware description model (VHDL/Verilog) that must be synthesized or purely functional
- Technology independent, flexible and reconfigurable
- Hard to protect

- **Hard IP**

- Placed and routed Netlist
- Depends on the technology, less flexibility
- provides timing information
- Easy to protect (=blackbox with I/Os)
- Simple and easy to integrate

- **Firm IP**

- Netlist after synthesis
  - Technology dependent (physical lib.)
  - Hard to protect
- = Tradeoff between performance of the IP hard and flexibility of the soft IPs

## □ IP Cores



# SoC Design & Intellectual Properties

## ❑ Some examples

- Processors (ARM7, ARM8,...)
- DSPs
- I/Os (PCI/PCIe, USB,...)
- Dedicated units (Crypto, digital communications,...)
- Network (Ethernet,...)
- Commercial IPs [www.design-reuse.com](http://www.design-reuse.com)
- Open IPs [opencores.org](http://opencores.org)

## ❑ Some IP providers

- R&D companies (fabless) : **IP Soft** for simulation and synthesis + licence :
  - ARM, ...
- Semiconductors companies: **IP Hard+Soft**
  - TI, Motorola, Intel/Altera, Xilinx,...
- Tool providers: **IP Soft**
  - Mentor Graphics, Cadence,...

# Interconnect

# Interconnect topologies

## ❑ Evolution of on-chip communication architectures



- The most widely used topology is the **bus**
- **Network-on-chip** (wired, wireless, hybrid) provide high-performance for future multicore SoC

# Interconnect topologies

## □ Busses to networks

- Evolution of the architectural paradigm :
  - Replace wires by **network**
  - Everything is packed in **packets**
  - Network on chip hides the physical interconnect from the design



## ❑ Terminology

- *Bus* : wires shared by multiple elements with logic to control its use
- *Bus Protocol* : set of rules for transmitting information between two or more elements over a bus (sequence of events, timing information etc.)
- *Master* : IP initiating a data transfer (read/write)
- *Slave* : IP responding only to the request of the master
- *Arbiter* : Control and manage the access on the bus (multi-masters)
- *Decoder* : Determine the receiver for the transfer
- *Bridge* : Connects two buses (which are not of the same type having different protocols), acts as master on one side and slave on the other

## ❑ Bus signals

- *Address*
  - *Carry address of destination for which the transfer is initiated*
  - *Can be shared or not for read, write*
- *Data*
  - *Carry information between source and destination*
  - *Can be shared or not for read/write data*
- *Control*
  - *Requests and acknowledgments*
  - *Specify additional information about the type of transfer (byte enable, burst size,...)*



## ❑ Shared Bus

- Simple wires used as **shared communication medium**, interconnecting several modules
  - Only one component can send a message on the bus at a given time
  - Need arbiters if several elements wish to communicate on the bus at the same time
  - The bus can become a bottleneck
  - Simplicity but difficult to scale
  - Example : **OPB (On-Chip Peripheral Bus –IBM/Xilinx)**
- Example :
  - R/W transactions on the bus
  - Selection of the peripheral regarding the address



# Interconnect topologies : bus

## □ Synchronous/asynchronous Bus transfers

| Synchronous                                                             | Asynchronous                    |
|-------------------------------------------------------------------------|---------------------------------|
| Common clock lign (wire)                                                | No clock lign                   |
| little logique, very fast                                               | Support by a lot of peripherals |
| Require synchronization (same clock rate for all the connected modules) | Require a handshacking protocol |



# Interconnect topologies : bus

## ❑ Hierarchical Bus : Three level approach



- **High performance bus**
  - Highest bandwidth (>10GB/s)
  - High end embedded processors and accelerators
  - PCIe, 10G Ethernet
  - DDR2/3/4 memory, embedded DRAM
- **Mid performance bus**
  - Lower bandwidth (<10GB/s)
  - Low and Mid embedded processors
  - PCI, USB2, Gigabit Ethernet
  - DDR memory, SRAM
- **Low performance bus**
  - Standard low bandwidth peripheral
  - I2C, USB, GPIO, Timers
  - UART, Flash memory

# Interconnect topologies : bus

## ❑ Hierarchical Bus

- Multiple buses connected to each other through bridges
- Concurrent transfer on each bus
- Organized according to bandwidth and performance
- Bridge complexity



### ❑ Embedded busses

- AMBA 2.0, 3.0, 4.0 (ARM)
- Core Connect (IBM)
- STBus (STMicroelectronics)
- Avalon (Altera)
- Wishbone (OpenCores)
- ...

## □ Crossbar Interconnects

- Technology compensating for the increasing number of communicating elements wishing to access a common resource inside a SoC.
  - Needs concurrent accesses
  - Needs increased performance
- Every Master is connected to every slave with a dedicated bus => high data transfer parallelism (better throughput than hierarchical buses)
- Large area => High power consumption, timing closure problems



Crossbar architecture

N horizontal busses  
= N inputs

N vertical busses = N outputs



Matrix of  $N \times N$  connexions

# Interconnect topologies : NoC

## NoC Interconnect

- NoCs are an attempt to scale down the concepts of large-scale networks, and apply them to the embedded system-on-chip (SoC) domain



Topological illustration of a 4-by-4 grid structured NoC, indicating the fundamental components.



Other NoC topologies

## Interconnect topologies : NoC

- Network-on-chip (NoC) is a packet switched on-chip communication network designed using a layered methodology
- NoCs use packets to route data from the source to the destination PE via a network fabric that consists of
  - switches (routers)
  - interconnection links (wires)



## ❑ NoC Properties

- Regular geometry that is **scalable**
- Flexible **QoS guarantees**
- Higher bandwidth
- **Reusable components**
  - Buffers, arbiters, routers, protocol stack
- No long global wires (or global clock tree)
  - No problematic global synchronization
  - **GALS: Globally asynchronous, locally synchronous design**
- Reliable and predictable electrical and physical properties



## Interconnect topologies : Summary

### □ To summarize,

- A bus is an excellent medium to interconnect several elements on a chip
- The bus is shared and leads to a « bottleneck » in the system
- Many standards exists e.g. AMBA, Avalon, enabling design reuse and plug-and-play system design...

### □ For future SoC,

- Unsustainable technology (Bus/crossbar) for complex SoC development with thousands of cores, heterogeneity
- Network-on-chip interconnect allows to increase the flexibility as well as the performance. SoC design is easier at system level without concerns about protocol, width conversion,...

# Interconnect topologies : Summary



Source : Daniel Chillet, 3D Optical ManyCore Communication Allocation in a ManyCore Architecture based on an Optical NoC, Journée thématique de Emerging Interconnect Technologies, Paris, 27 novembre 2017

# ETIS research project : WiNoCod // Wizarde

WiNoCod : Chip Multi-processors



Wizarde



# ARM AMBA & AXI

## ❑ Standards & specifications

- Simplify IP integration
- Avoid integration mismatches

## ❑ What is AMBA

- *Advanced Microcontroller Bus Architecture = set of specifications*
- Based on the ARM Processor
- 4 specifications
  - AMBA [1996] -> 2 Busses
    - ASB Advanced System Bus
    - APB Advanced Peripheral Bus (Simple)
  - AMBA 2.0 [1999] -> 3 busses
    - AHB Advanced High-Performance Bus (Complex)
  - AMBA 3.0 [2003] Introduction of AXI standard
  - AMBA 4.0 [2010] 2<sup>nd</sup> version of AXI, AXI4

## ❑ From AMBA 3.0 Specifications

- Based on the ARM processor
- Three busses :
  - AHB - Advanced High-performance Bus
    - Burst transfer, multiples masters and slaves, large data bus (64-128bits),for high bandwidth...
  - APB – Advanced Peripheral Bus
    - Low-power, low-complexity interface, low bandwidth...
  - ASB - Advanced system Bus (older version of AHB)
  - Introduction of AXI – Advanced eXtensible Interface (for FPGA)
    - Protocol and interface specifications for IP cores
- This standard aims at solving the following issues :
  - Modularity and design reuse
  - Interface Protocol and efficient clock/reset management,
  - Energy Efficiency thank to AHB/APB partitioning
  - Access for on-chip test



ARM AMBA based system

## □ What is AXI

- It's a part of ARM's AMBA
- The first version of AXI was first included in AMBA 3.0, released in 2003
- Xilinx has adopted this protocol for its FPGAs (Virtex-6, Spartan-6 et Série 7)



**AMBA: Advanced Microcontroller Bus Architecture**  
**AXI: Advanced Extensible Interface**

## ❑ AXI protocol

- Suitable for high-frequency and low-latency designs
- High frequency operation without using complex bridges
- Meets the interface requirements of a wide range of components
- Provides flexibility in the implementation of interconnect architectures
- Compatible with existing AHB & APB interfaces

- AMBA 4.0, released in 2010, includes the second major version of AXI: AXI4
- 3 types of interfaces
  - Full (Memory-mapped)
    - High-performance memory mapped
  - Stream
    - High-speed streaming data (unidirectional) with reduced signal routing
  - Lite (Memory-mapped)
    - Simple and low throughput memory-mapped communication



## ❑ AXI protocols

- Memory-mapped (**AXI3, AXI4, AXI4-Lite**)
  - All transactions use a target address within a system memory map space and data to be transferred
  - Example : a typical peripheral or memory bank
- Stream (**AXI4-stream**)
  - Data-only communication (ex: dataflow application where no address is required)
  - Example : data coming from an USB controller

## ❑ Key features

- Separate address/control data phases
- Burst-based transactions (1-16 data transfer per burst of ≠ sizes (8-1024 bits))
- ..

## ❑ Typical MM master-slave connections using channels

- 5 transactions channels :
  - Read address channel
  - Read data channel
  - Write address channel
  - Write data channel
  - Write response channel



## Read Transaction



## Write Transaction

I'm sending data. Please store it.



## Flow control

- **Basic valid/ready handshaking**
  - Information is exchanged only when source is valid and destination is ready
  - Each channel has its own valid/ready
  
- **Both Master and slave can limit the flow**
  - Flexible signaling functionality
    - Inserting wait states
    - Ready before start
    - Same cycle acknowledge



## AXI MM signals



## AXI Interconnect

- Interconnect Core
  - Crossbar routes the traffic between slave and master interfaces
  - Optional couplers can perform various conversion and buffering functions (Registers, data FIFO, clock converter, data width converter, etc.)
  - Configuration up to 16 slaves and 16 masters interfaces (if 1 slave, up to 64 Masters)



## AXI Interconnect : several configurations

- Pass through



Resynchronisation for reset signals

- 1-to-N Interconnect



- N-to-1 Interconnect



Figure 2-2: N-to-1 AXI Interconnect

- Conversion only



Clock, width conversion...

- N-to-M Interconnect (crossbar mode)



- M-to-N Interconnect (shared access mode)



X14017

## ❑ SmartConnect : an enhanced interconnect

- Dedicated to complex designs with multiple interfaces
- Achieves high throughput and low latency interconnect network
- Main features
  - SmartConnect core connects one or more AXI memory-mapped master devices to one or more memory-mapped slave devices.
  - AXI SmartConnect replaces AXI Interconnect v2 core and allows minimal user intervention
  - Integrated since VIVADO Design 16.1



## ❑ As Summary

- Standards are important to make easier IP integration and avoid integration mismatches
- Standard bus architectures define
  - Interface between IPs and bus architecture
  - Specifics (not always) of bus architecture that implements data transfer protocol
- Such standards such as AXI increase productivity and flexibility during SoC design
  - Many IP providers support AXI Protocol
  - A lot of tools is available, allowing efficient system design, test and performance characterization

# SoCs FPGA

## New tools for co-design

## ❑ SoC FPGAs

- why?
  1. **Time to market**  
⇒ Low time-to-market and design time,  
fast prototyping
  2. **Performance** : large amount of hardware  
resources, parallelism
  3. **Area** : regular vs ASIC area
  4. **Reliability** : related to the regularity of  
interconnections and resources,
  5. **Dynamic Reconfiguration** during  
execution



## ❑ Tools from FPGA providers for design re-use (IP)

Altera

MegaWizzard Plug-In Manager



Xilinx

Core Generator (ISE)



## □ Soft Core Nios II

Goals : Reduce costs, power consumption and complexity of embedded systems

⇒ Allow designers to easily add peripheral to the processor, on the same chip

⇒ instantiation of peripheral according to the application needs



## ❑ Main Features of NIOS-II Softcore from Altera

- Harvard architecture
  - Data Master port
  - Instruction Master port
- RISC instruction set
- Bank of 32 registers (32 bits)
- No pipeline, 5 or 6 stages configuration
- 32 interrupt sources
- Avalon embedded bus
  - ⇒ Interconnect the processor and its peripherals
    - Avalon switch Fabric (managed SoPC Builder tool)
    - Avalon Interface (managed by the designer)



## □ Design flow



## ❑ SoC Intel/Altera

- Arria X Soc
  - Dual-core ARM cortex A9-based processing system
    - 1.5GHz
  - 20nm TSMC Programmable Logic
    - 160K-660K Logic Elements
    - 3,365 multipliers, 39Mb memory ,
  - Other elements:
    - Several I/Os, DDR3 et DDR4 SDRAM mémories, high-performance transceivers...
    - NEON DSP Core (SIMD)
    - BSP support for OS : Linux, WxWorks, uC/OS-II et III



## ❑ SoC FPGA development Suite /Altera (1/2)

- Quartus Prime (Hardware)
  - Set of tools from design entry and synthesis to optimization, verification, simulation.
  - Support
    - a HLS Compiler (C++ to RTL code)
    - Block-based design flow (incremental)
    - Partial reconfiguration
    - Platform designer (Qsys) for IP design and integration
    - Signal Tap Logic Analyzer, Power Analyzer



## ❑ SoC FPGA development Suite /Altera (2/2)

- Intel SoC FPGA Embedded Development Suite (Software)

Intel® SoC FPGA Embedded Development Suite



ARM Development Studio 5  
Intel SoC FPGA Edition

Powerful Eclipse IDE based on ARM DS-5 is power packed with features. Code, build, debug, and optimize in one IDE!

Hardware Libraries

Easy access to low-level hardware for configuration and control.

Configuration Tools

Intel FPGA-specific SoC configuration tools to improve productivity.

Examples

Reference designs, U-boot, linux reference examples



FPGA debug



Multi-Core debug

## ❑ Xilinx design flow : some tools



### Xilinx Platform Studio (XPS)

- Graphical system design (I/Os selection, memory controllers, clocks, AXI interconnect...)
- Drag&Drop IPs from issues dedicated libraries (Audio video image processing, DSP, memory interfaces,...)

# SoC FPGA



## Xilinx Software Development Kit (SDK)

- Templates for C/C++ development projects, Boot loader, Board Support Package, Memory & peripheral tests
- Build/Deploy/Debug Petalinux distribution for Xilinx boards

# SoC FPGA



Xilinx PlanAhead Design tool

Design Analysis and Implement Custom Hardware IP

- VHDL/Verilog, RTL Sources, Timing constraints,
- Placement/Route Timing and Power Analysis,
- Design Implementation, bitstream generation

Co-Simulate &amp; Debug



Simulators for every use case :

- Functional simulation for SW
  - Cycle accurate simulation for HW
  - HW/SW co-simulation
- 
- Xilinx ISIM or Modelsim for HW debug, power & timing analysis
  - Xilinx ChipScope Pro for concurrent HW/SW debug

# VIVADO

- Integrated Design Environment, gathering all these tools (excepted SDK)

The Vivado logo, featuring the word "VIVADO" in a large, stylized font with a trademark symbol, and a small green graphic element to the right.

## ❑ Xilinx All Programmable SoC

- Xilinx Zynq-7000
  - Dual-core ARM cortex A9-based processing system (PS)
    - Including memory and peripheral controllers
    - Fully independent from PL
  - 28nm Xilinx Programmable Logic (PL)
    - Highly integrated circuit, high performance
    - Used to extend PS
  - Other elements:
    - ARM AXI high performance interfaces
    - Several I/Os, ADC inputs, high-performance transceivers...



Xilinx Zynq-7000 SoC

## □ Practical work using VIVADO and Zynq-7000 ARM FPGA SoC

- 1<sup>st</sup> part : Programing the PS on SDK
- 2<sup>nd</sup> part : System Design under Vivado
  - Control of hardware peripheral from PS
- 3<sup>rd</sup> part : extension with additional peripheral



Zedboard from Avnet/Digilent

# References

- ARM Specification, <https://www.arm.com/products/system-ip/amba-specifications>
- G. Khan, “SoC Interconnect bus structures”, Ryerson University <http://www.ee.ryerson.ca/~gnkhan>
  - Very good presentation of the main embedded bus
- Xilinx, Breakthrough UltraScale+ Device Performance with SmartConnect Technology , White Paper WP478, April 2016 [https://www.xilinx.com/support/documentation/white\\_papers/wp478-smartconnect.pdf](https://www.xilinx.com/support/documentation/white_papers/wp478-smartconnect.pdf)
- Universidad de Alcalà, “Design of System of Chip”, [http://www.depeca.uah.es/depeca/repositorio/asignaturas/201757/Unit\\_x\\_1\\_AXI\\_INFRAESTRUCTURE.pdf](http://www.depeca.uah.es/depeca/repositorio/asignaturas/201757/Unit_x_1_AXI_INFRAESTRUCTURE.pdf)
  - AXI infrastructure overview
- Xilinx, AXI Interconnect Product Guide v2.1, PG059, 20 december 2017, [https://www.xilinx.com/support/documentation/ip\\_documentation/axi\\_interconnect/v2\\_1/pg059-axi-interconnect.pdf](https://www.xilinx.com/support/documentation/ip_documentation/axi_interconnect/v2_1/pg059-axi-interconnect.pdf)
- Wishbone Specifications, <https://opencores.org/howto/wishbone>