

# FPGA based Embedded SoPCs: System on Programmable Chips

COE718: Embedded Systems Design  
<http://www.ee.ryerson.ca/~courses/COE718/>

**Dr. Gul N. Khan**  
<http://www.ee.ryerson.ca/~gnkhan>  
**Electrical and Computer Engineering**  
**Ryerson University**

---

## Overview

- Introduction to Altera SoPCs
- Embedded System on Programmable Chip
- Cyclone IV and Stratix FPGAs
- Nios-II CPU Core and SoPC IPs; Avalon Bus
- SOPC Builder

**Introductory Articles on SoPC, DE2-115 Manual available at the course webpage.**  
**Part of Sections 5.4, 6.3, 6.4 and part of Chapter 9 of text by Navabi.**

# Programmable Chips

Mainly two sources of programmable chips that can accommodate a significant part of embedded computer system including CPU and peripheral devices

## Altera

- Stratix, Cyclone use NIOS processor core
- Cyclone-IV, Stratix-V Devices use ARM Cortex CPU based core

## Xilinx

- Spartan II, Virtex-II, Virtex-5, Virtex-6 and Virtex-7 accommodate Microblaze, PowerPC cores, as well as ARM Cortex cores.

# A Nios CPU based SoPC FPGA



# DE2-115 NIOS Development Board



# DE2-115 Block Diagram



# DE2-115: SoPC Options



# DE2-115: TV Box Configuration



# DE2-115: TV Box Configuration



# DE2-115: SD Music Player



# DE2-115: SD Music Player



# Altera System on Programmable Chips

## Cyclone and Stratix Series FPGA

### Cyclone IV-E Family FPGAs

| Resources                    | EP4CE6 | EP4CE10 | EP4CE15 | EP4CE22 | EP4CE30 | EP4CE40 | EP4CE55 | EP4CE75 | EP4CE115 |
|------------------------------|--------|---------|---------|---------|---------|---------|---------|---------|----------|
| Logic elements (LEs)         | 6,272  | 10,320  | 15,408  | 22,320  | 28,848  | 39,600  | 55,856  | 75,408  | 114,480  |
| Embedded memory (Kbits)      | 270    | 414     | 504     | 594     | 594     | 1,134   | 2,340   | 2,745   | 3,888    |
| Embedded 18 × 18 multipliers | 15     | 23      | 56      | 66      | 66      | 116     | 154     | 200     | 266      |
| General-purpose PLLs         | 2      | 2       | 4       | 4       | 4       | 4       | 4       | 4       | 4        |
| Global Clock Networks        | 10     | 10      | 20      | 20      | 20      | 20      | 20      | 20      | 20       |
| User I/O Banks               | 8      | 8       | 8       | 8       | 8       | 8       | 8       | 8       | 8        |
| Maximum user I/O (#)         | 179    | 179     | 343     | 153     | 532     | 532     | 374     | 426     | 528      |

# Cyclone IV-E Features

Low-cost, low-power FPGA fabric:

- 6K to 150K logic elements
- Up to 6.3 Mb of embedded memory
- Up to 360  $18 \times 18$  multipliers for DSP processing intensive applications
- Protocol bridging applications for under 1.5 W total power

# Cyclone-IV LE (Logic Elements)

LE is a compact (**a small logic**) unit that provides advanced features **with efficient logic utilization.**

- A four-input look-up table (LUT), which is a function generator that can implement any function of four variables.
- A programmable register.
- A carry chain connection.
- A register chain connection.
- The ability to drive all types of interconnects: local, row, column, register chain, and direct link interconnects.
- Support for register packing and feedback.

Each LE's programmable register can be configured for D, T or JK operation and has data, clock, clock enable, and clear inputs.

# Cyclone Logic Element



# LAB: Logic Array Blocks

Each LAB consists of 16 LEs **LAB control signals, LE carry chains, register chains and local interconnect.**



# LE Normal and Arithmetic Modes

## Normal mode

General logic applications and combinatorial functions



# Logic Element Arithmetic Mode

Arithmetic mode is ideal for implementing adders, counters, accumulators, and comparators.

**LE can implement a 2-bit full adder and basic carry chain.**



# Cyclone/Startix M4K RAM



# Cyclone IV M9K Memory

M9K blocks support the following features:

- 8,192 memory bits per block (9,216 bits per block including parity).
- M9K memory block is split into two 4.5 K single-port RAMs in the Packed mode.
- Variable port configurations
- Single-port and simple dual-port modes support for all port widths
- Initialization file to pre-load memory content in RAM and ROM modes

# M9K Memory Feature Summary

| Feature                                     | M9K Blocks                                                                                                                                                              |
|---------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Configurations (depth × width)              | $8192 \times 1$<br>$4096 \times 2$<br>$2048 \times 4$<br>$1024 \times 8$<br>$1024 \times 9$<br>$512 \times 16$<br>$512 \times 18$<br>$256 \times 32$<br>$256 \times 36$ |
| Parity bits                                 | ✓                                                                                                                                                                       |
| Byte enable                                 | ✓                                                                                                                                                                       |
| Packed mode                                 | ✓                                                                                                                                                                       |
| Address clock enable                        | ✓                                                                                                                                                                       |
| Single-port mode                            | ✓                                                                                                                                                                       |
| Simple dual-port mode                       | ✓                                                                                                                                                                       |
| True dual-port mode                         | ✓                                                                                                                                                                       |
| Embedded shift register mode <sup>(1)</sup> | ✓                                                                                                                                                                       |
| ROM mode                                    | ✓                                                                                                                                                                       |

# M9K - Dual Port Memory

The widest bit configuration of the M9K blocks in true dual-port mode is  $512 \times 16$ -bit.



| Read Port | Write Port |          |          |          |          |          |          |
|-----------|------------|----------|----------|----------|----------|----------|----------|
|           | 8192 × 1   | 4096 × 2 | 2048 × 4 | 1024 × 8 | 512 × 16 | 1024 × 9 | 512 × 18 |
| 8192 × 1  | ✓          | ✓        | ✓        | ✓        | ✓        | —        | —        |
| 4096 × 2  | ✓          | ✓        | ✓        | ✓        | ✓        | —        | —        |
| 2048 × 4  | ✓          | ✓        | ✓        | ✓        | ✓        | —        | —        |
| 1024 × 8  | ✓          | ✓        | ✓        | ✓        | ✓        | —        | —        |
| 512 × 16  | ✓          | ✓        | ✓        | ✓        | ✓        | —        | —        |
| 1024 × 9  | —          | —        | —        | —        | —        | ✓        | ✓        |
| 512 × 18  | —          | —        | —        | —        | —        | ✓        | ✓        |

# Embedded Multiplier Block Architecture

The embedded multiplier consists of multiplier block, input and output registers and interfaces.



# IOE: I/O Element

IOEs are located in I/O blocks around the periphery of the device.

- An IOE contains one input register, one output register, and an output enable register.
- IOEs contain a bi-directional I/O buffer and three registers. **For complete embedded bi-directional single data rate transfer.**



# Cyclone-V SX SoC FPGA

Cyclone-V FPGA family is based on 1.1-V, 28-nm. High performance designs and SoC prototyping.

|                                             | 5CSXC2 | 5CSXC4 | 5CSXC5  | 5CSXC6  |
|---------------------------------------------|--------|--------|---------|---------|
| ALMs                                        | 9,434  | 15,094 | 32,075  | 41,509  |
| LEs (K)                                     | 25     | 40     | 85      | 110     |
| Registers                                   | 37,736 | 60,376 | 128,300 | 166,036 |
| M10K memory blocks                          | 140    | 224    | 397     | 514     |
| M10K memory (Kb)                            | 1,400  | 2,240  | 3,972   | 5,140   |
| MLAB memory (Kb)                            | 138    | 220    | 480     | 621     |
| Variable-precision DSP blocks               | 36     | 58     | 87      | 112     |
| 18 x 18 multipliers                         | 72     | 116    | 174     | 224     |
| Processor cores (ARM Cortex-A9)             | Dual   | Dual   | Dual    | Dual    |
| Global clock networks                       |        |        | 16      |         |
| PLLs <sup>2</sup> (FPGA)                    | 4      | 5      | 6       | 6       |
| PLLs <sup>2</sup> (HPS)                     | 3      | 3      | 3       | 3       |
| Transceiver count (3.125 Gbps)              | 6      | 6      | 9       | 9       |
| PCIe hard IP blocks (Gen1 x4)               | 2      | 2      | 2       | 2       |
| GPIOs (FPGA)                                | 145    | 145    | 288     | 288     |
| GPIOs (HPS)                                 | 188    | 188    | 188     | 188     |
| Hard memory controllers <sup>4</sup> (FPGA) | 1      | 1      | 1       | 1       |
| Hard memory controllers <sup>4</sup> (HPS)  | 1      | 1      | 1       | 1       |

# Stratix-IV/Cyclone MLAB Memory



**LUT-based SRAM capability to the LAB. The MLAB supports a maximum of 640 bits of dual-port SRAM. MLAB is a superset of the LAB and includes all LAB features.**

# Stratix-V GX Family

Stratix-V FPGA family is based on 0.85-V, 28-nm. Highest performance designs, highest logic- and memory-density designs, and ASIC prototyping.

|                               | Maximum Resource Count for Stratix V GX FPGAs (0.85 V) <sup>1</sup> |                                                    |         |         |           |                                    |         |         |           |           |
|-------------------------------|---------------------------------------------------------------------|----------------------------------------------------|---------|---------|-----------|------------------------------------|---------|---------|-----------|-----------|
|                               | 5SGXA3                                                              | 5SGXA4                                             | 5SGXA5  | 5SGXA7  | 5SGXA9    | 5SGXB                              | 5SGXB5  | 5SGXB6  | 5SGXB9    | 5SGXBB    |
| ALMs                          | 128,300                                                             | 158,500                                            | 185,000 | 234,720 | 317,000   | 359,200                            | 185,000 | 225,400 | 317,000   | 359,200   |
| LEs (K)                       | 340                                                                 | 420                                                | 490     | 622     | 840       | 952                                | 490     | 597     | 840       | 952       |
| Registers                     | 513,200                                                             | 634,000                                            | 740,000 | 938,880 | 1,268,000 | 1,436,800                          | 740,000 | 901,600 | 1,268,000 | 1,436,800 |
| M20K memory blocks            | 957                                                                 | 1,900                                              | 2,304   | 2,560   | 2,640     | 2,640                              | 2,100   | 2,660   | 2,640     | 2,640     |
| M20K memory (Mb)              | 19                                                                  | 37                                                 | 45      | 50      | 52        | 52                                 | 41      | 52      | 52        | 52        |
| MLAB memory (Mb)              | 3.92                                                                | 4.84                                               | 5.65    | 7.16    | 9.67      | 10.96                              | 5.65    | 6.88    | 9.67      | 10.96     |
| Variable-precision DSP blocks | 256                                                                 | 256                                                | 256     | 256     | 352       | 352                                | 399     | 399     | 352       | 352       |
| 18 x 18 multipliers           | 512                                                                 | 512                                                | 512     | 512     | 704       | 704                                | 798     | 798     | 704       | 704       |
| I/O Features                  | LVDS channels, 1.4 Gbps (receive/transmit)                          | 174                                                | 174     | 210     | 210       | 210                                | 150     | 150     | 150       | 150       |
|                               | Embedded DPA circuitry                                              |                                                    |         |         |           | ✓                                  |         |         |           |           |
|                               | OCT                                                                 |                                                    |         |         |           | Series, parallel, and differential |         |         |           |           |
|                               | Transceiver count (14.1 Gbps)                                       | 36                                                 | 36      | 48      | 48        | 48                                 | 66      | 66      | 66        | 66        |
|                               | PCIe hard IP blocks (Gen3)                                          | 2                                                  | 2       | 4       | 4         | 4                                  | 4       | 4       | 4         | 4         |
|                               | Memory devices supported                                            | DDR3, DDR2, QDR II, QDR II+, RLDRAM II, RLDRAM III |         |         |           |                                    |         |         |           |           |

# Arria-V SX SoC FPGA

Arria-V FPGA family is based on 1.1V, 28-nm. It can prototype and implement ARM Cortex CPU based SoC

|                                 | 5ASXB3  | 5ASXB5  |
|---------------------------------|---------|---------|
| ALMs                            | 132,075 | 174,340 |
| LEs (K)                         | 350     | 462     |
| Registers                       | 528,300 | 697,360 |
| M10K memory blocks              | 1,729   | 2,282   |
| M10K memory (Kb)                | 17,288  | 22,820  |
| MLAB memory (Kb)                | 2,014   | 2,658   |
| Variable-precision DSP blocks   | 809     | 1,068   |
| 18 x 18 multipliers             | 1,618   | 2,186   |
| Processor cores (ARM Cortex-A9) | Dual    | Dual    |
| Global clock networks           |         | 16      |
| PLLs <sup>2</sup> (FPGA)        | 10      | 14      |
| PLLs <sup>2</sup> (HPS)         | 3       | 3       |

# Stratix Adaptive Logic Module Structure



# Stratix ALM

ALM can operate in one of the modes that are Normal, Extended LUT, Arithmetic and Shared arithmetic mode.



# $4 \times 2$ Crossbar Switch on one ALM

$4 \times 2$  crossbar switch (**two 4-to-1 multiplexers with common inputs and unique select lines**) can be implemented in one ALM.



# ALM in Arithmetic Mode



# Example of a 3-bit Add Utilizing Shared Arithmetic Mode

3-Bit Add Example

$$\begin{array}{l}
 \text{1st stage add is} \\
 \text{implemented in LUTs.} \\
 \left\{ \begin{array}{r}
 \begin{array}{r}
 X_2 \ X_1 \ X_0 \\
 Y_2 \ Y_1 \ Y_0 \\
 + Z_2 \ Z_1 \ Z_0 \\
 \hline S_2 \ S_1 \ S_0 \\
 + C_2 \ C_1 \ C_0 \\
 \hline R_3 \ R_2 \ R_1 \ R_0
 \end{array}
 \end{array} \right.
 \end{array}$$

Binary Add

$$\begin{array}{r}
 1 \ 1 \ 0 \\
 1 \ 0 \ 1 \\
 + 0 \ 1 \ 0 \\
 \hline 0 \ 0 \ 1 \\
 + 1 \ 1 \ 0 \\
 \hline 1 \ 1 \ 0 \ 1
 \end{array}$$

Decimal Equivalents

$$\begin{array}{r}
 6 \\
 5 \\
 + 2 \\
 \hline 1 \\
 + 2 \times 6 \\
 \hline 13
 \end{array}$$

shared\_arith\_in = '0'

ALM Implementation



# Stratix IOE Structure



# Stratix DSP Block Architecture



# Nios-II Embedded Processor



# Nios-II CPU Core Features

- Nios-II CPU is a pipelined, single-issue RISC processor
- Most instructions run in a single clock cycle.
- The instruction set is targeted for compiled (**high level language**) embedded applications.
- The Nios family of soft core processors includes 32-bit architecture.
- The register file size is 32 register of 32-bit wide. However, it can have one or more shadow register files transparent to application code.
- Shadow register files are manipulated by kernel as Nios-II has support the user and kernel modes.

# Nios-II Programmer Model

| Register | Name | Function              | Register | Name | Function                      |
|----------|------|-----------------------|----------|------|-------------------------------|
| r0       | zero | 0x00000000            | r16      |      | Callee-saved register         |
| r1       | at   | Assembler temporary   | r17      |      | Callee-saved register         |
| r2       |      | Return value          | r18      |      | Callee-saved register         |
| r3       |      | Return value          | r19      |      | Callee-saved register         |
| r4       |      | Register arguments    | r20      |      | Callee-saved register         |
| r5       |      | Register arguments    | r21      |      | Callee-saved register         |
| r6       |      | Register arguments    | r22      |      | Callee-saved register         |
| r7       |      | Register arguments    | r23      |      | Callee-saved register         |
| r8       |      | Caller-saved register | r24      | et   | Exception temporary           |
| r9       |      | Caller-saved register | r25      | bt   | Breakpoint temporary (1)      |
| r10      |      | Caller-saved register | r26      | gp   | Global pointer                |
| r11      |      | Caller-saved register | r27      | sp   | Stack pointer                 |
| r12      |      | Caller-saved register | r28      | fp   | Frame pointer                 |
| r13      |      | Caller-saved register | r29      | ea   | Exception return address      |
| r14      |      | Caller-saved register | r30      | ba   | Breakpoint return address (2) |
| r15      |      | Caller-saved register | r31      | ra   | Return address                |

# Traditional Bus Architecture for an Embedded Computer System



# A Typical Nios CPU based System

## Ethernet Frame Data Transmission Path Using DMA and Simultaneous Multi-Mastering



# Altera SoPC IPs

- DMA

The Nios direct memory access (DMA) peripheral is used to perform DMA data transfers between two memories, between a memory and a peripheral, or between two peripherals.

DMA peripheral has two Avalon master ports—a master read port and a master write port, and one Avalon slave port for controlling the DMA.

- Avalon Bus: On-chip Multi-master active bus
- PIO: PIO module is a memory-mapped interface between software and user-defined logic.
- Timer: 32-bit interval timer
- SPI: Serial Peripheral Interface
- UART: RS-232 asynchronous transmit and receive logic.

# DMA Peripheral with Master & Slave Ports



# Nios DMA Transfer

1. Configures DMA to transfer data by writing to the control port.
2. Software enables the DMA peripheral. The peripheral then begins transferring data without additional intervention from the CPU.
3. The DMA's master read port reads data from the read address, which may be a memory or a peripheral, while the master write port writes the data to the destination address. **A shallow FIFO may buffer data between the read and write ports.**
4. The DMA transfer ends when a specified number of bytes are transferred, or an “end of packet” (EOP) symbol is transferred.  
**The DMA peripheral may issue an interrupt request at the end of the transfer.**
5. During or after the transfer, software may determine if a transfer is in progress, or if the transfer ended (and how) by examining the DMA's status register.

# Avalon Bus

- Avalon bus is **an active**, on-chip bus architecture that accommodate the SOPC environment.
- The interface to peripherals is synchronous with the Avalon clock.  
**Therefore, no complex, asynchronous handshaking and acknowledge schemes are necessary.**
- Multiplexers (**not tri-state buffers**) inside the bus determine which signals drive which peripheral. Peripherals are never required to tri-state their outputs.  
**Even when the peripheral is deselected**
- The address, data and control signals use separate, dedicated ports.  
**It simplifies the design of peripherals as they don't need to decode address and data bus cycles as well as disable its outputs when it is not selected.**

# Avalon Bus based System Module

System Module integrated with user Logic into an Altera PLD/FPGA.



# Avalon Bus Module

The Avalon bus module (an Avalon bus) is a unit of active logic that takes the place of passive, metal bus lines on a physical PCB.



# Slave Arbitrator

Avalon bus module contains one slave arbitrator for each shared slave port. Slave arbitrator performs the following.

- Defines control, address, and data paths from multiple master ports to the slave port and specifies the arbitration mechanism to use when multiple masters contend for a slave at the same time.
- At any given time, selects which master port has access to the slave port and forces all other contending masters (if any) to wait, based on the arbitration assignments.
- Controls the slave port, based on the address, data, and control signals presented by the currently selected master port.

# SOPC Builder Phase Sequence

SOPC Builder is a tool that takes library components as input & provides assembled embedded systems as output.

- SOPC Builder generates plain text HDL code (**either VHDL or Verilog**) for all of the bus-interconnect logic in the system.



# SOPC Builder Library Components

- NIOS-II Processors with optional cache and multiplier
- Microcontroller peripherals
- Digital signal processing (DSP) cores
- Intellectual property (IP) cores
- Communications peripherals, JTAG,
- Interfaces
  - Memory (on/off-chip), buses, bridges, ASSPs and ASICs
- Software components
  - Header files, Generic C drivers, Operating system (OS) kernels, Middleware

# Nios-II based Embedded Computer System



# SoPC Builder: System Tab

## The System Contents Window



# Nios II Types Available



# SoPC Builder Tab

## System Contents with Nios II Processor



# SoPC Builder: Final Nios II System

- Final System Contents
- Auto Assign Base Addressing



# Nios System Generation



# Software for Nios System

## NIOS II IDE New Project Wizard

