

# *Hybrid, reconfigurable, high-performance edge computing architectures for 5G and beyond*

## *Challenges and Opportunities*

Igor Alvarado

Academic Research Manager

[igor.alvarado@ni.com](mailto:igor.alvarado@ni.com)



October 3-4, 2019

All Rights Reserved.

# Key Trends

- Convergence & Transdisciplinary
- Beyond “Systems of Systems”
- Multi-Domain Operations
- Complexity & Emergence
- Ubiquitous Connectivity, 5G & Beyond
- (Real-Time) AI & Autonomy
- Cyber (Physical) Security
- Beyond Moore’s Law
- Human-Machine Symbiosis



# Key Trends

- Convergence & Transdisciplinary
- Beyond “Systems of Systems”
- Multi-Domain Operations
- Complexity & Emergence
- Ubiquitous Connectivity, 5G & Beyond
- (Real-Time) AI & Autonomy
- Cyber (Physical) Security
- Beyond Moore’s Law
- Human-Machine Symbiosis

## Enablers



Source (Images): DARPA, U.S. Army, NSF and National Academies.

All Rights Reserved.



# DARPA's Colosseum (256-chan. RF Emulator) with NI's Software-Defined Radios (SDRs)



## COLOSSEUM: THE WORLD'S LARGEST RF EMULATOR... THE ENVIRONMENT FOR ENSEMBLE SPECTRUM AI

### It's Really Big

- 25.6 GHz total instantaneous bandwidth
- 100MHz per channel x 256 x 256 channels
- 420 Tb/s of digital RF data
- 1.88 TB of scenario model data (30min)



### Comprised of

- 128 USRP X310 (128 FPGAs)
- 16 ATCA-3671 hosting 64 FPGAs

### Specifications

- 128 2x2 MIMO Tx/Rx Ports
- Phase Coherent
- Bandwidth : 80 MHz BW
- Tunable: 10 MHz to 6GHz
- 4 tap PDP emulation (10ns resolution, 5us max delay, 1000Hz updates)



# Platforms for Advanced Wireless Research (PAWR)

POWDER



Salt Lake City

COSMOS



New York City

AERPAW



Colosseum

DARPA's massive RF emulator has now transitioned to the PAWR program.



Testbeds

PAWR will have four city-scale testbeds for advanced wireless research launched by the end of 2020.



Innovation Zones

PAWR sites have been designated the first-ever Innovation Zones for spectrum research.

# Real-Time HPC System Architecture (Example)



Note: Newer VST and USRPs are available.

All Rights Reserved.

# RT-HPC AI-enabled Platform



# The 5G Infrastructure: FPGA-based SDRs, SDNs, Edge Computing



# From sub-6GHz (FR1) to mmWave (FR2) for 5G NR



# 5G NR as an Enabler

# A need for new computing architectures

Enable ultra-Reliable Low-Latency Communications to support mission critical applications and services now possible with 5G NR



# A need for new computing architectures

Enable ultra-Reliable Low-Latency Communications to support mission critical applications and services now possible with 5G NR



# Network Slicing to the Rescue

# A need for new computing architectures

Enable ultra-Reliable Low-Latency Communications to support mission critical applications and services now possible with 5G NR



# 5G and Network Slicing

- A network slice is a logical (virtual) network that provides specific network capabilities and network characteristics in order to serve a defined business purpose of a customer while sharing a common physical infrastructure.
- A network slice consists of different subnets:
  - Radio Access Network (RAN) subnet, Core Network (CN) subnet, Transport network (TN) subnet.
- Allows the operator to provide customized networks.
  - Different requirements on functionality (e.g., priority, charging, policy control, security, and mobility),
  - Differences in performance requirements (e.g., latency, mobility, availability, reliability and data rates)
  - Serve only specific users (e.g. public safety, corporate customers, etc.).
- Can provide the functionality of a complete network, potentially from different vendors.
- One network can support one or several network slices.

# 5G Network Slicing for Vertical Industries



# Automated Network Slicing

# E2E Automated Network Slicing



# uRLLC Slices for Mission Critical Applications

# Mission Critical Services and Applications

- Need for ultra Reliable, Low-Latency Communications (uRLLC) slices
- High-Availability
- Precision PNT (Position, Navigation and Timing)
- Cyber (physical) security



# Reliability

- “In the context of network layer packet transmissions, percentage value of the amount of sent network layer packets successfully delivered to a given system entity within the time constraint required by the targeted service, divided by the total number of sent network layer packets.”
- For uRLLC, it is usually “five nines” (99.999%) or more

## Low-Latency

- 5G will address the full set of low latency requirements
- $\leq 0.5\text{ms}$  for DL and UL user plane latency (without high reliability requirement)
- • Reliability of 10<sup>-5</sup> for 32-byte packet with a user plane latency of 1ms

# Performance Challenges for Different Scenarios

# Scenarios

- Several scenarios require the support of **very low latency and very high communications service availability (and reliability.)**
  - Support the transmission over the radio interface of a packet of 32 bytes with a reliability of 99,999% and a user plane latency of 1ms, as described in 3GPP TR 38.913
- The overall service latency depends on:
  - The delay on the radio interface,
  - Transmission within the 5G system,
  - Transmission to a server which may be outside the 5G system, and
  - Data processing.
- Some of these factors depend directly on the 5G system itself, whereas for others the impact can be reduced by suitable interconnections between the 5G system and services or servers outside of the 5G system, for example, to allow local hosting of the services.

# Diverse and demanding requirements call for network slicing



| Use-Case              |                                                                                                                              | Delivered by Network Slice |         |               |            |                |                  |
|-----------------------|------------------------------------------------------------------------------------------------------------------------------|----------------------------|---------|---------------|------------|----------------|------------------|
| Application Category  | Examples                                                                                                                     | Throughput (bps)           |         | Latency (RTT) |            | Reliability    | Cost Sensitivity |
|                       |                                                                                                                              | UL                         | DL      | E2E Appl.     | Network    |                |                  |
| Critical automation   | <ul style="list-style-type: none"> <li>▪ Collaborative robots/vehicles</li> <li>▪ Electrical grid tele-protection</li> </ul> | 1-10M                      | 1M      | 5-50 ms       | 1-5 ms     | High/Very High | Low              |
| Tele-operation        | <ul style="list-style-type: none"> <li>▪ Video-based remote control</li> <li>▪ Video w/haptic remote cntrl</li> </ul>        | 1-10M                      | 1M      | 50-150 ms     | 1-25 ms    | High/Very High | Medium           |
| Highly interactive AR | <ul style="list-style-type: none"> <li>▪ Co-present Mixed Reality</li> <li>▪ 360° volumetric video AR/MR</li> </ul>          | 1-100M                     | 5-100M  | 50-100 ms     | 1-10 ms    | Medium         | Medium           |
| Mass sensor arrays    | <ul style="list-style-type: none"> <li>▪ Agricultural field sensors</li> <li>▪ Smart city sensors &amp; meters</li> </ul>    | 1k - 1M                    | 1k - 1M | 1-2 s         | 200-500 ms | Medium-Low     | Very High        |

Slice Performance

Low

High

# Performance requirements for low-latency and high-reliability scenarios

| Scenario                                                | Max. allowed end-to-end latency (note 2) |  | Survival time | Communication service availability (note 3) | Reliability (note 3) | User experienced data rate | Payload size (note 4) | Traffic density (note 5) | Connection density (note 6)    | Service area dimension (note 7) |
|---------------------------------------------------------|------------------------------------------|--|---------------|---------------------------------------------|----------------------|----------------------------|-----------------------|--------------------------|--------------------------------|---------------------------------|
| Discrete automation                                     | 10 ms                                    |  | 0 ms          | 99,99%                                      | 99,99%               | 10 Mbps                    | Small to big          | 1 Tbps/km <sup>2</sup>   | 100 000/km <sup>2</sup>        | 1000 x 1000 x 30 m              |
| Process automation – remote control                     | 60 ms                                    |  | 100 ms        | 99,9999%                                    | 99,999%              | 1 Mbps up to 100 Mbps      | Small to big          | 100 Gbps/km <sup>2</sup> | 1 000/km <sup>2</sup>          | 300 x 300 x 50 m                |
| Process automation – monitoring                         | 60 ms                                    |  | 100 ms        | 99,9%                                       | 99,9%                | 1 Mbps                     | Small                 | 10 Gbps/km <sup>2</sup>  | 10 000/km <sup>2</sup>         | 300 x 300 x 50                  |
| Electricity distribution – medium voltage               | 40 ms                                    |  | 25 ms         | 99,9%                                       | 99,9%                | 10 Mbps                    | Small to big          | 10 Gbps/km <sup>2</sup>  | 1 000/km <sup>2</sup>          | 100 km along power line         |
| Electricity distribution – high voltage (note 2)        | 5 ms                                     |  | 10 ms         | 99,9999%                                    | 99,999%              | 10 Mbps                    | Small                 | 100 Gbps/km <sup>2</sup> | 1 000/km <sup>2</sup> (note 8) | 200 km along power line         |
| Intelligent transport systems – infrastructure backhaul | 30 ms                                    |  | 100 ms        | 99,9999%                                    | 99,999%              | 10 Mbps                    | Small to big          | 10 Gbps/km <sup>2</sup>  | 1 000/km <sup>2</sup>          | 2 km along a road               |

NOTE1: Currently realised via wired communication lines.

NOTE 2: This is the maximum end-to-end latency allowed for the 5G system to deliver the service in the case the end-to-end latency is completely allocated to the 5G system from the UE to the Interface to Data Network.

NOTE 3: Communication service availability relates to the service interfaces, reliability relates to a given system entity. One or more retransmissions of network layer packets may take place in order to satisfy the reliability requirement.

NOTE 4: Small: payload typically ≤ 256 bytes

NOTE 5: Based on the assumption that all connected applications within the service volume require the user experienced data rate.

NOTE 6: Under the assumption of 100% 5G penetration.

NOTE 7: Estimates of maximum dimensions; the last figure is the vertical dimension.

NOTE 8: In dense urban areas.

NOTE 9: All the values in this table are targeted values and not strict requirements. Deployment configurations should be taken into account when considering service offerings that meet the targets.

# Other Challenges

# Ethernet-based Integration (non-deterministic) and Clock/PPS Synchronized Configuration

Using USRPs and an Octoclock-G



# LO-distribution using the USRP N321



# Convergence at the edge: Multi-core CPUs and FPGAs

# NI USRP-2974 – Stand-Alone NI USRP-RIO

High performance SDR for compute intensive applications

## General Specs

- 10 MHz – 6 GHz frequency range
- 160 MHz instantaneous bandwidth per channel
- Intel i7 processor with 8GB of RAM w/ NI Linux RT
- Kintex-7 410T FPGA
- GPS Disciplined Clock
- 2X2 MIMO (2 Rx, 2 Tx)
- Hardware control over 1G/10G Ethernet
- PCIe expansion port to connect to additional USRP RIO
- 2U Form Factor
- LabVIEW Communications System Design Suite and 802.11 and LTE Application Frameworks support for programming Real-Time and FPGA
- **Also programmable with Open Source tools (e.g. GNU Radio)**



## Applications

- Algorithm Engineering
- MAC/PHY Prototyping
- Stand-Alone SDR Applications
- LTE, 5G, Wi-Fi, and MIMO Research
- UE Emulation

# NI USRP-2974 Block Diagram



# Achieving Low-Latency, Deterministic Performance

# Ethernet-based Synchronization with White Rabbit (WR)

- Based on the White Rabbit (WR) open-source standard
- WR is a fully deterministic Ethernet-based network protocol for general purpose data transfer and synchronization
- WR is an extension of the IEEE 1588 Precision Time Protocol (PTP) standard, which distributes time references over Ethernet networks:
  - Uses Synchronous Ethernet (SyncE) to distribute a common clock reference over the network across the Ethernet physical layer to ensure frequency synchronization between all nodes.
  - This combination of SyncE and PTP, in addition to further measurements, provides sub-nanosecond synchronization over distances of up to 10 km.
- Ethernet-based synchronization enables precise baseband synchronization over large distances in GPS-denied environments:
  - Consumes one of the SFP+ ports of the USRP N310 devices and therefore reduces the number of connectors available for IQ streaming; requires additional hardware.

# Example: Distributing PPS & Clock Signals over the Network

Sub-nanosecond synchronization over distances of up to 10 km



# Time Sensitive Networking (TSN)

- Time Sensitive Networking (TSN) is the evolution of standard Ethernet, specifically the IEEE 802.1 standard, to add capabilities such as time synchronization over the network and deterministic, low-latency communication to the open network of Ethernet.
- TSN synchronization is provided through the IEEE 802.1AS standard, which allows automatic synchronization between compliant Ethernet switches and end stations. This simplifies the way you synchronize your distributed measurement system, because you need only a single Ethernet cable between devices.
- The cable not only carries all your typical network packets, such as your measurement data, but also provides a common notion of time to which all devices in the IEEE 802.1AS subnet synchronize.

# TSN: FPGA synchronization to the 1588 master clock



# PXI/PXIe: In-chassis triggering and synchronization

- Two internal, shareable clocks (10 MHz, 100 MHz)
- Timing boards with high-precision shareable clocks can be added
- External clocks can be used and shared among all devices



# PXI/PXIe Triggering and Synchronization



All Rights Reserved.

# Multi-Gigabit Links, Serial Comms. Protocols

# Xilinx® Aurora

- Aurora 64B/66B is a lightweight, serial communications protocol for multi-gigabit links.
- It is used to transfer data between devices using one or many transceivers, chip-to-chip links, board-to-board and backplane links.
- Connections can be full-duplex (data in both directions) or simplex (data in either one of the directions).
- Throughput varies from 0.48 Gb/s to 254.06 Gb/s for a single-lane design to a 16-lane design, respectively.



# Maximizing FPGA-based System Throughput

# Maximizing System Throughput

- Throughput is the result of three factors:
  - The rate of the clock you are using to drive your design (cycles/time)
  - The number of samples that the IP accepts per call (samples/call)
  - The number of cycles that must elapse before your algorithm can be called again (cycles/call); this is also referred to as the “initiation interval” (II) or “initiation period”
- These three factors combine to provide samples/time per the following definition:

$$\text{Throughput} = \frac{\text{Clock Rate} \times \text{Samples per Call}}{\text{Initiation Interval}}$$

- Thus, to increase throughput you can:
  - Increasing the clock rate
  - Increasing the number of samples processed per call
  - Decreasing the initiation interval

# Parallelization

- Since FPGAs are inherently parallel devices, making parallel processing of multiple samples or data sets is an attractive option for increasing throughput.
- FPGAs are also constrained with respect to resources, and parallelization usually leads to a linear increase in resource use.
- Explicitly parallelizing your code by making copies of it can lead to code that is difficult to read and maintain.
- Such techniques as Pipelining can be used for increasing code throughput. It involves the insertion of registers along the critical path of the SCTL to break it into shorter, concurrent sections of code.
- The shorter sections of code take less time to execute, which allows you to increase the SCTL clock rate.

# Pipelining vs. non-Pipelining Performance



# RF Network-on-a-Chip

# RFNoC: RF Network on Chip

- Make FPGA acceleration easier for USRPs
- Software API + FPGA infrastructure
- Handles FPGA – Host communication / Dataflow
- Provides user simple software and HDL interfaces
- Scalable design for massive distributed processing
- Fully supported in GNU Radio



# RFNoC Architecture



# Adding a (freq.) Window



# Software Tools

# Software Tools

- LabVIEW Comms. and AFWs
- C/C++
- Python
- GNU Radio
- RFNoC
- Xilinx® VIVADO
  - VHDL
  - Verilog
- Matlab®
- IP Blocks
- 5G Software stacks (e.g. OpenAirInterface)



XILINX  
XILINX PROCESSORS INC.



OpenLTE



# Questions?

# Questions?

- Igor Alvarado  
Email: [igor.alvarado@ni.com](mailto:igor.alvarado@ni.com)  
URL: [www.ni.com/research](http://www.ni.com/research)

