

**Strategies to Enable Reliable Next-Generation  
Applications on Embedded Space Platforms**

by

**Noah E. Perryman**

B.S. Mechanical Engineering, University of Pittsburgh 2018

B.S. Computer Engineering, University of Pittsburgh 2018

M.S. Electrical and Computer Engineering, University of Pittsburgh 2020

Submitted to the Graduate Faculty of

the Swanson School of Engineering in partial fulfillment

of the requirements for the degree of

**Doctor of Philosophy**

University of Pittsburgh

2024

UNIVERSITY OF PITTSBURGH  
SWANSON SCHOOL OF ENGINEERING

This dissertation was presented

by

Noah E. Perryman

It was defended on

July 23, 2024

and approved by

Samuel J. Dickerson, PhD, Associate Professor,  
Department of Electrical and Computer Engineering

Jingtong Hu, PhD, Associate Professor,  
Department of Electrical and Computer Engineering

Peipei Zhou, PhD, Assistant Professor,  
Department of Electrical and Computer Engineering

Matthew Barry, PhD, Associate Professor,  
Department of Mechanical and Materials Science Engineering

**Dissertation Director:** Alan D. George, PhD, Professor,  
Department Chair, Department of Electrical and Computer Engineering

Copyright © by Noah E. Perryman  
2024

# **Strategies to Enable Reliable Next-Generation Applications on Embedded Space Platforms**

Noah E. Perryman, PhD

University of Pittsburgh, 2024

Space computing has considerable performance restrictions that are imposed by the limited onboard-processing capabilities provided by heritage single-board computers. Due to these limitations, current state-of-the-art devices for space edge computing are unable to meet the resource and performance requirements for next-generation communication, navigation, and artificial intelligence (AI) applications planned for future science and defense missions. To address these issues, domain-specific architectures with specialized acceleration hardware, such as the Xilinx Versal Adaptive System-on-Chip architecture, have been developed. This heterogeneous platform provides significant energy-efficient compute capabilities, but it is susceptible to radiation-induced single-event effects and therefore the dependability of the device must be characterized prior to inclusion on future space-computing platforms.

Conversely, there is a direct need for high-visibility NASA missions that provide significant scientific impact or have a high mission class using completely radiation-hardened electronics solutions, to enable AI applications in harsh environments despite severe size, weight, and power constraints. For these missions, where current state-of-the-art solutions such as the Versal are too power-demanding or are incapable of surviving the intended radiation environment, an alternative radiation-hardened processing architecture that can leverage the control-flow capabilities of scalar processors while also incorporating the hardware-acceleration capabilities of an FPGA is of significant value.

In this research, performance, energy-efficiency, and resource-utilization tradeoffs for the Versal were evaluated by benchmarking a suite of representative next-generation AI and communication applications for space. These experiments included four image-classification models based on convolutional neural network architectures and the multitaper spectral estimation algorithm, a relevant communication algorithm based on the fast Fourier transform. Next, a methodology for evaluating and increasing the dependability of semantic segmen-

tation deep learning models on heterogeneous systems featuring FPGAs as well as other compute elements is proposed. To demonstrate this methodology, three semantic segmentation DL models accelerated on the AMD-Xilinx Deep Learning Processing Unit for the Versal are evaluated. Lastly, an alternative architecture to the Versal for ultra-low power, high-reliability spaceflight applications is investigated. This investigation resulted in the design of an architecture with the highly reliable, radiation-hardened GR740 scalar processor paired with the low-power, radiation-tolerant CertusPro-NX-RT FPGA for increased performance.

PREVIEW

## Table of Contents

|                                                                                                                   |           |
|-------------------------------------------------------------------------------------------------------------------|-----------|
| <b>Preface</b>                                                                                                    | <b>xi</b> |
| <b>1.0 Introduction</b>                                                                                           | <b>1</b>  |
| <b>2.0 Background Research</b>                                                                                    | <b>6</b>  |
| 2.1 Space-Computing Platforms . . . . .                                                                           | 6         |
| 2.2 Softcore Processors, Hardcore processors, and FPGAs . . . . .                                                 | 9         |
| 2.3 Versal Adaptive SoC . . . . .                                                                                 | 10        |
| 2.4 GR740 . . . . .                                                                                               | 15        |
| 2.5 CertusPro-NX-RT . . . . .                                                                                     | 17        |
| 2.6 Radiation Effects and Dependability Metrics . . . . .                                                         | 18        |
| 2.7 Image-Classification Algorithms . . . . .                                                                     | 21        |
| 2.8 Semantic Segmentation for Earth Observation . . . . .                                                         | 23        |
| 2.9 Multitaper Spectral Estimation . . . . .                                                                      | 25        |
| 2.10 Deep Learning on FPGAs and SoCs . . . . .                                                                    | 26        |
| <b>3.0 Comparative Analysis of Next-Generation Space Computing Applications on AMD-Xilinx Versal Architecture</b> | <b>28</b> |
| 3.1 Experimental Setup and Approach . . . . .                                                                     | 29        |
| 3.1.1 Application Details . . . . .                                                                               | 29        |
| 3.1.2 CNN Architectures . . . . .                                                                                 | 31        |
| 3.1.3 MTSE Architectures . . . . .                                                                                | 35        |
| 3.2 Results and Analysis . . . . .                                                                                | 37        |
| 3.2.1 CNN Applications . . . . .                                                                                  | 37        |
| 3.2.2 MTSE Applications . . . . .                                                                                 | 49        |
| 3.3 Conclusion . . . . .                                                                                          | 61        |

|                                                                                                   |            |
|---------------------------------------------------------------------------------------------------|------------|
| <b>4.0 Dependable DPU Architectures on AMD-Xilinx Versal Adaptive SoCs for Space Applications</b> | <b>63</b>  |
| 4.1 Approach . . . . .                                                                            | 63         |
| 4.1.1 DPU Benchmarking . . . . .                                                                  | 64         |
| 4.1.2 Fault Injection . . . . .                                                                   | 67         |
| 4.1.3 DPU Fault-Mitigation Schemes . . . . .                                                      | 71         |
| 4.2 Results and Analysis . . . . .                                                                | 72         |
| 4.2.1 DPU and Model Analysis . . . . .                                                            | 72         |
| 4.2.2 Dependability Analysis . . . . .                                                            | 79         |
| 4.2.2.1 SDC Distribution Analysis . . . . .                                                       | 80         |
| 4.2.2.2 MWTF, AVF, and Critical Bits Analysis . . . . .                                           | 89         |
| 4.2.2.3 Model-Level Analysis . . . . .                                                            | 92         |
| 4.3 Conclusion . . . . .                                                                          | 98         |
| <b>5.0 SpaceCube GR740 Host for Onboard Science and Telemetry</b>                                 | <b>100</b> |
| 5.1 Hardware Architecture . . . . .                                                               | 100        |
| 5.1.1 Device Selection . . . . .                                                                  | 101        |
| 5.1.2 GHOST Architecture . . . . .                                                                | 102        |
| 5.2 Architecture Capabilities . . . . .                                                           | 106        |
| 5.3 Conclusions . . . . .                                                                         | 113        |
| <b>6.0 Conclusions</b>                                                                            | <b>116</b> |
| <b>Bibliography</b>                                                                               | <b>119</b> |

## List of Tables

|    |                                                                               |     |
|----|-------------------------------------------------------------------------------|-----|
| 1  | Summary of Versal resources . . . . .                                         | 13  |
| 2  | Versal AI Engine and AI Engine-ML comparison . . . . .                        | 15  |
| 3  | GR740 Specifications . . . . .                                                | 17  |
| 4  | CertusPro-NX-RT Specifications . . . . .                                      | 18  |
| 5  | Summary of subsystems and peripherals utilized for each application. . . . .  | 32  |
| 6  | Communication application configuration details . . . . .                     | 37  |
| 7  | Image-classification application throughput performance . . . . .             | 39  |
| 8  | Image-classification application average power consumption . . . . .          | 44  |
| 9  | Image-classification application max power consumption . . . . .              | 45  |
| 10 | Image-classification application energy efficiency . . . . .                  | 48  |
| 11 | Image-classification and communication application PL resource utilization .  | 50  |
| 12 | Image-classification and communication application AIE resource utilization . | 51  |
| 13 | Communication application throughput performance and energy efficiency .      | 53  |
| 14 | Communication application power consumption . . . . .                         | 59  |
| 15 | AI Edge and AI Core DPU interface details . . . . .                           | 65  |
| 16 | DL model FP32 and INT8 accuracy . . . . .                                     | 66  |
| 17 | DL models compiled for the DPU . . . . .                                      | 67  |
| 18 | AI Edge and AI Core DPU resource utilization . . . . .                        | 73  |
| 19 | Semantic-segmentation model benchmarking results . . . . .                    | 78  |
| 20 | Fault-injection space for DPU architectures and semantic segmentation models  | 81  |
| 21 | Semantic-segmentation model architecture vulnerability factor . . . . .       | 82  |
| 22 | Semantic-segmentation model critical bits . . . . .                           | 83  |
| 23 | CertusPro-NX image-classification benchmarking . . . . .                      | 111 |
| 24 | SpaceCube GHOST estimated total board power . . . . .                         | 114 |

## List of Figures

|    |                                                                              |    |
|----|------------------------------------------------------------------------------|----|
| 1  | Radiation-hardened and commercial processor comparison . . . . .             | 3  |
| 2  | Hybrid space computers . . . . .                                             | 8  |
| 3  | Hardcore and softcore processor comparison . . . . .                         | 11 |
| 4  | Versal architecture . . . . .                                                | 12 |
| 5  | Versal development boards . . . . .                                          | 14 |
| 6  | AI Engine tile architecture . . . . .                                        | 16 |
| 7  | Radiation-induced single-event effects . . . . .                             | 19 |
| 8  | Generalized convolutional neural network architecture . . . . .              | 22 |
| 9  | U-Net semantic-segmentation model . . . . .                                  | 24 |
| 10 | Multitaper spectral estimation architecture . . . . .                        | 26 |
| 11 | Versal generalized application architecture . . . . .                        | 30 |
| 12 | Image-classification application development flow . . . . .                  | 34 |
| 13 | Image-classification application throughput performance . . . . .            | 38 |
| 14 | Image-classification application average power consumption results . . . . . | 42 |
| 15 | Image-classification application max power consumption results . . . . .     | 43 |
| 16 | Image-classification application energy efficiency . . . . .                 | 47 |
| 17 | Communication application throughput performance . . . . .                   | 52 |
| 18 | Communication application normalized throughput performance . . . . .        | 56 |
| 19 | Communication application average power consumption results . . . . .        | 57 |
| 20 | Communication application max power consumption results . . . . .            | 58 |
| 21 | Communication application energy efficiency . . . . .                        | 60 |
| 22 | AMD-Xilinx AI Core and AI Edge series DPU architectures . . . . .            | 64 |
| 23 | Semantic-segmentation model performance results . . . . .                    | 75 |
| 24 | Semantic-segmentation model average power results . . . . .                  | 76 |
| 25 | Semantic-segmentation model max power results . . . . .                      | 76 |
| 26 | Semantic-segmentation model energy-efficiency results . . . . .              | 77 |

|    |                                                                           |     |
|----|---------------------------------------------------------------------------|-----|
| 27 | Experimental samples of silent data corruptions . . . . .                 | 84  |
| 28 | Overall silent data corruption distribution . . . . .                     | 86  |
| 29 | VCK190 individual interface silent data corruption distribution . . . . . | 87  |
| 30 | VEK280 individual interface silent data corruption distribution . . . . . | 88  |
| 31 | Silent data corruption critical bits . . . . .                            | 93  |
| 32 | Silent data corruptions for U-Net based on data transfer . . . . .        | 94  |
| 33 | Instruction interface silent data corruptions . . . . .                   | 95  |
| 34 | Image interface silent data corruptions . . . . .                         | 96  |
| 35 | Weight interface silent data corruptions . . . . .                        | 97  |
| 36 | Bias interface silent data corruptions . . . . .                          | 97  |
| 37 | SpaceCube GHOST high-level architecture . . . . .                         | 103 |
| 38 | SpaceCube GHOST CertusPro-NX-RT interface breakout . . . . .              | 104 |
| 39 | SpaceCube GHOST PCB . . . . .                                             | 107 |
| 40 | SpaceCube GHOST NAND Flash and I/O extender configuration . . . . .       | 108 |
| 41 | SpaceCube GHOST coprocessor configuration . . . . .                       | 109 |
| 42 | SpaceCube GHOST app accelerator configuration . . . . .                   | 110 |
| 43 | CertusPro-NX Voice and Vision Machine Learning board . . . . .            | 111 |
| 44 | SpaceCube GHOST system configurations . . . . .                           | 115 |

## Preface

I dedicate this dissertation to my family and friends, especially my wife Cassie, whose understanding and support when undertaking this dissertation research gave me the inspiration to never give up.

This dissertation research was supported by industry and government members of the National Science Foundation (NSF) Center for Space, High-Performance, and Resilient Computing (SHREC) and its I/UCRC Program under Grant No. CNS1738783. I wish to thank Alan George for serving as my coauthor and advisor for all dissertation research.

I would like to thank the students, faculty, and members of the NSF SHREC Center that have supported this dissertation research. I also wish to express my deepest gratitude to the Code 587 Science Data Processing Branch of NASA Goddard Space Flight Center for their continued support and guidance on this dissertation research. I especially wish to thank Christopher Wilson, Sebastian Sabogal, David Wilson, Justin Goodwill, and Nicholas Franconi for their mentorship. I would also like to thank NASA Goddard Space Flight Center's Internal Research and Development program, specifically the Cross-Cutting Technology Capabilities (CCTC) Line of Business (LOB) and SmallSat LOB.

Finally, I would like to extend my sincere thanks to AMD-Xilinx collaborators including Timothy Vales, David Sandler, and Ken O'Neill.

## 1.0 Introduction

Current state-of-the-art processors, including both traditional, radiation-hardened (rad-hard) and commercial-off-the-shelf devices, are unable to address the mission requirements of future science and exploration missions, especially those involving artificial-intelligence (AI) and communication applications [1]. Additionally, increasingly stringent requirements in size, weight, power, and cost (SWaP-C) further exacerbate the difficulties of addressing these mission needs. Not only are these missions constrained by current state-of-the-art hardware with limited performance capabilities, but they are also challenged by increasing complexity and data sizes required by next-generation applications. Advancements in sensor technology (e.g., resolution, capture data-rate, etc.) have also introduced downlink challenges that require increasingly higher performance computing to process sensor data onboard. However, the computational capability required for these advanced data-processing applications is dramatically limited by spacecraft SWaP-C. These limitations therefore drive a growing need for affordable devices that have high-performance and energy-efficient computing capabilities that are also suitable for SWaP-C-constrained platforms, such as small satellites (SmallSats).

For missions requiring next-generation space applications, several organizations have identified the need for more capable edge devices. For example, in the National Academies' planetary science decadal survey [2], a crucial need for low-latency and high-throughput datapaths from sensor-to-processor and high-bandwidth communication capabilities was identified for future science and exploration missions. These advancements will help to improve science return, resource efficiencies, autonomy, and reliability for space missions. Additionally, for AI applications in autonomy and onboard analysis, such as high-resolution image segmentation [3], enhanced computing is required for modern neural networks and models with varied complexity and input data sizes. Selected examples of AI applications, specifically deep-learning (DL) applications, on space-computing platforms include remote sensing [4], Earth observation (EO) [5], navigation [6], and image compression [7].

Despite the benefits provided by AI for space-computing platforms, many scalar processors used in SmallSats are unable to meet the demands of these AI applications due to stringent SWaP-C or reliability constraints [1]. These scalar processors, especially those that are radiation-hardened (rad-hard), are limited in performance compared to commercial-off-the-shelf (COTS) processors typically used for applications in tolerable harsh environments [8]. Figure 1 illustrates this concept by showing a computational density (CD) comparison of current rad-hard (RAD750, GR712RC, GR740, and RAD5545) and commercial processors (Intel Core i7-4610Y). While rad-hard scalar processors are typically not as performant and energy efficient as COTS processors, they are extremely reliable. COTS processors, on the other hand, offer increased performance and energy efficiency, but typically do not offer the necessary reliability required for space-computing platforms in higher orbits.

To mitigate reliability and performance limitations, hybrid systems featuring a unique combination of COTS, rad-hard, and radiation-tolerant devices as well as fault-tolerant computing techniques have been introduced [9]. These hybrid systems bridge the performance and reliability gap between fully COTS and fully rad-hard systems. These systems commonly use system-on-chips (SoCs) that feature compute elements, such as digital signal processors (DSPs), GPUs, field-programmable gate arrays (FPGAs), and vector processors alongside conventional scalar processors to increase system performance. Since these COTS devices are susceptible to radiation-induced single-event effects (SEEs) that are prevalent in space, it is important for hybrid systems to use appropriate fault-tolerant strategies to provide a sufficient balance of performance and dependability for the entire system.

DL model characteristics can also vary in terms of complexity, performance, energy efficiency, resource utilization, and dependability. Many DL models, particularly convolutional neural networks (CNNs), have been shown to be highly resilient to single-event upsets (SEUs) [10], although this is dependent on model complexity. Therefore, a thorough evaluation of a system at the device and application level is necessary to characterize its viability for inclusion on a space-computing platform.

Similar to AI applications, next-generation communication applications for space also present several onboard computational challenges, especially in the digital signal processing (DSP) domain. Primarily, future communication applications for space will require high



Figure 1: Comparison of current rad-hard (RAD750, GR712RC, GR740, and RAD5545) and commercial (Intel Core i7-4610Y and NVIDIA Tegra X1) processors [8].

availability and efficient use of the available radio spectrum to ensure effective communication and prevent the loss of mission-critical data. Currently, traditional ground-based tactics to manually control these platforms are time-consuming due to the transmission delays inherent in communication over long distances. Because of these transmission delays, communication with ground stations can be suboptimal or impractical. Therefore, a high degree of autonomy, which requires significant compute capabilities, is necessary for future space communication platforms to ensure operational success. NASA Glenn’s Cognitive Communications Infusion Study Report [11] highlights techniques needed for improved communication capabilities and more efficient use of the electromagnetic spectrum in wireless applications, which also require intensive DSP computational resources. Similarly, future space-networking architectures, such as the LunaNet infrastructure for a remote network on the moon [12], require significant compute density and advanced high-speed connectivity to meet high-bandwidth communication and low-latency requirements. Current state-of-the-art rad-hard processors cannot meet these needs [8]. Therefore, improvements in onboard data processing, autonomous systems, and navigation can further reduce the burden and cost of the ground segment and mission operations in SmallSats [13] for planned space-networking architectures.

Many edge-computing companies are focusing on exploring advanced domain-specific architectures (DSAs) that use specialized hardware acceleration to increase computing performance and power efficiency compared to general-purpose architectures. One prominent example is the AMD-Xilinx Versal Adaptive SoC architecture. This heterogeneous compute platform consists of several subsystems, including a dual-core ARM Cortex-A72 processor, a dual-core ARM Cortex-R5F processor, a programmable-logic (PL) FPGA, and AI Engine (AIE) vector processors all interconnected by a network-on-chip (NoC). By incorporating a novel mix of networked scalar and vector processing units with PL, the Versal enables developers to customize their own DSAs adapted for their specific use cases.

However, for NASA missions with reliability constraints, a completely rad-hard electronics solution may be required with components that meet stringent flight-qualified requirements. Alternatively, some experiments or instruments have essential periods of operation, such as a critical observation opportunity or a crucial propulsion burn, that cannot tolerate

any interruptions due to the harsh radiation environment. For these types of scenarios, a rad-hard processor may be essential for meeting mission requirements. Frontgrade has been at the forefront of rad-hard processor design and has endeavored to create state-of-the-art solutions based on the LEON and RISC-V processor architectures. One prominent example, the GR740 microprocessor [14], was originally developed within the European Space Agency’s (ESA) Next-Generation MicroProcessor (NGMP) initiative, and recently the production of GR740 flight models have been completed, meeting expected screening and qualification tests.

A design that can provide the reliability of a rad-hard processor while also bridging the performance gap and providing a low-power operational mode prominently addresses the space computing needs identified in the NASA 2020 Technology Taxonomy [1]; however, many other areas benefit from advanced onboard computing as well. The requirements for capable, rad-hard processors were heavily emphasized for planetary needs [15] with discussions for the Mars rover capabilities (RAD750-based) and for deep space SmallSats [16]. Most significantly, in a briefing for “Next Generation Processing for Space Systems” [17], NASA Jet Propulsion Laboratory (JPL) highlighted that increased rad-hard processing capability would be needed for future missions. Likely, the needs identified in the JPL study can only be met with a combination of performant and energy-efficient technologies.

This dissertation is organized as follows. Chapter 2 provides background information on space-computing platforms, softcore and hardcore processors, next-generation rad-hard, radiation-tolerant, and commercial processors, radiation effects and dependability metrics, next-generation AI and communication applications, and DL techniques on FPGAs and SoCs to introduce relevant background for the dissertation research. Chapter 3 presents a comparative analysis of next-generation space computing applications on the AMD-Xilinx Versal Adaptive SoC. Chapter 4 details a methodology for evaluating and analyzing DL model accuracy, performance, energy-efficiency, and dependability metrics on FPGA- and SoC-based DL model accelerators. Chapter 5 describes an alternative custom architecture to the Versal designed for ultra-low power, high-reliability spaceflight applications. Finally, Chapter 6 concludes this dissertation.

## 2.0 Background Research

This section provides a cursory overview on current state-of-the-art processors for space-computing platforms, as well as highlights existing methods for reliable and energy-efficient DL implementations. First, this section provides a summary of current rad-hard and hybrid systems used on space-computing platforms. This section then compares hardcore and softcore processor implementations. Next, this section introduces the Versal Adaptive SoC, the GR740, and the CertusPro-NX-RT as a promising devices for future space-computing platforms. Next, this section discusses FPGA and SoC radiation susceptibility as well as an overview of reliability testing and fault-mitigation techniques. In addition, this section covers applications for DL and communication, specifically image classification, semantic segmentation for EO, and MTSE. Finally, this section concludes with a description of the AMD-Xilinx Deep Learning Processing Unit (DPU) generalized CNN accelerator.

### 2.1 Space-Computing Platforms

Space single-board computers (SBCs) typically selected for NASA’s flagship missions are rad-hard and extremely reliable to safeguard against failure due to radiation effects. While highly reliable, these SBCs are based on antiquated architectures that are several generations behind current state-of-the-art commercial processors due to lengthy and complex process of radiation hardening. Notably, rad-hard processors such as BAE Systems RAD750 [18] are now decades old. New rad-hard processors such as the planned High Performance Space-flight Computing (HPSC) [19] are extremely promising, providing performance, power, and reliability benefits that would meet the demands of next-generation AI and communications applications. The HPSC is a multi-core, high-performance, high-reliability processor featuring the RISC-V architecture [20]. However, the HPSC is still years away from flight and its potential benefits can only be speculated for now. Newer, more capable devices, such as the BAE Systems RAD5545 [21], Argotec FERMI [22], and Southwest Research Institute

Centaur [23] SBCs are available, but these SBCs either do not meet the performance or energy-efficiency needs of DL applications for space or are prohibitively expensive.

The Modular Unified Space Technology Avionics for Next Generation (MUSTANG) [24] avionics catalog is the premier suite of cards developed by the 560 Electrical Engineering Division of NASA Goddard Space Flight Center (GSFC). The design portfolio includes a variety of avionics cards including a processor card, communication card, housekeeping card, thermal control card, and many more (over 22 cards are listed in the complete catalog). These designs are built for larger flagship-class missions and have a custom form factor ( $5.25'' \times 8''$ ). The current MUSTANG processor card features an RTG4 and GR712 dual-core LEON3FT, and an upgraded card featuring the new GR740 is in development. Unfortunately, these cards are too large for CubeSats, SmallSats, or small instrument processors. NASA GSFC has also developed the Modular Architecture for a Resilient Extensible SmallSat (MARES) [25] catalog, which is a series of reliable electronic slices that conforms to the CS2 form factor and allows missions to address challenging space radiation requirements in a CubeSat size. The MARES Command and Data handling (C&DH) card features a Microchip RTG-4 that can instantiate a LEON3FT softcore processor. While MARES does include several softcore processing options, it does not have any dedicated hardcore processor options, which would substantially increase performance and energy efficiency and conserve FPGA resources.

Therefore, developers have been investigating radiation-tolerant designs and hybrid architecture strategies [9] to ameliorate performance and reliability challenges. Specifically, NASA GSFC has developed a cross-cutting, inflight reconfigurable, FPGA-based, onboard hybrid science-data processing suite of cards, named the SpaceCube family, that incorporate a unique mix of radiation-tolerant, rad-hard, and COTS components. The SpaceCube family of devices was developed due to the limited processing capabilities of traditional rad-hard SBCs for space as previously discussed. The SpaceCube v3.0 Mini processor card [26], the most recent and performant SWaP-constrained SpaceCube card, features a high-performance, radiation-tolerant AMD-Xilinx Kintex UltraScale FPGA that has capabilities, resources, and input/output (I/O) that far exceed anything available in typical rad-hard systems, especially in a small form-factor card ( $3.5'' \times 3.5''$ ). However, a design as complex as the SpaceCube v3.0 Mini can be increasingly power-demanding, removing its viability for

certain NASA missions with lower power budgets even though it can meet the performance needs of current image-processing and communication applications as described in [27].

In addition to the SpaceCube v3.0 Mini, many prominent SBCs follow a hybrid architecture combining both commercial and rad-hard components with fault-tolerant computing strategies. Several examples are provided in [9]; however, highlighted examples for this research include the CHREC Space Processor (CSP) [28], SHREC Space Processor (SSP) [29], Focal Plane Interface Electronics – Digital (FPIE-D) [30], Innoflight Compact Flight Computer (CFC-400XS) [31], GomSpace NanoMind HP MK3 [32], and Xiphos Q7 [33] processor cards. The CSP, SSP, FPIE-D, GomSpace NanoMind HP MK3, and Xiphos Q7 each feature the AMD-Xilinx Zynq-7000 series SoC that combines a dual-core ARM Cortex-A9 application processing unit (APU) with PL and the Innoflight CFC-400XS features the AMD-Xilinx Zynq UltraScale+ MPSoC. Selected hybrid space SBCs are shown in Figure 2. A more exhaustive list of rad-hard and radiation-tolerant space SBCs can be found in [34].



Figure 2: Hybrid space computers including (a) NSF CHREC CSP, (b) NSF SHREC SSP, and (c) NASA Goddard Space Flight Center SpaceCube v3.0 Mini.

Despite the performance advantages of hybrid architecture SBCs, many SBCs would still struggle to adequately meet requirements for next-generation AI and communication applications in future missions. For example, a next-generation communication applica-

tion featuring a 64-antenna 200-MHz system would require more than 1500 Giga multiply-accumulate (MAC) operations per second for downlink and even more for uplink [35]. Most current AMD-Xilinx devices used for SWaP-C-constrained environments are unable to meet the demands, or require significant resource utilization, for the previously described example as well as similar next-generation communication applications for space. However, the Versal can substantially meet the demands for the prior example given, offering up to a  $2.14\times$  performance-per-Watt improvement over leading Intel Agilex counterparts [35]. Similarly, in CNN-based image-classification algorithms, while there are existing devices that can meet the necessary performance requirements, they are not energy-efficient solutions and therefore not the most viable option for SWaP-C-constrained platforms. Many SmallSats are constrained to power budgets on the order of 10s of Watts, with only larger-class missions having the power budgets for existing devices that can meet the necessary performance requirements [34]. For smaller missions constrained to lower-power budgets, the main driving factor is to maximize performance for the given power budget. For these power ranges, the Versal has been shown to be more energy efficient than other leading FPGAs. For example, AMD-Xilinx claims that the Versal delivers a  $2.7\times$  more energy-efficient solution for a ResNet-50 implementation than the Intel Agilex 7 FPGA F-Series [35]. These examples highlight the capabilities of the Versal to meet the requirements of next-generation AI and communication applications.

## 2.2 Softcore Processors, Hardcore processors, and FPGAs

The SpaceCube v3.0 Mini FPGA enables the exploitation of algorithmic parallelism to rapidly accelerate applications by creating custom architectures performing many calculations in parallel. While many applications benefit from FPGA acceleration, scalar processors are often still needed due to control-flow-constrained applications that do not benefit from parallelism. As previously described, there are options available for more performant, highly reliable processors by implementing softcore processors on an FPGA. To address control-flow-oriented applications, the AMD-Xilinx-based SpaceCube v3.0 Mini currently implements a

softcore processor (MicroBlaze or Rocket Chip RISC-V) in the FPGA. Similarly, the MARES C&DH card features the LEON3FT. While this approach is feasible for some cases, it consumes FPGA resources that are more valuable for algorithm acceleration. Softcore processors are also more limited in operating frequency when compared to hardcore processors and are thus more limited in performance. As demonstrated in Figure 3, for applications designed to run sequentially in fixed-logic architectures, hardcore processors, such as the Frontgrade GR740, provide significant performance improvements over softcore processors that can be instantiated by the MARES FPGAs, such as the MicroBlaze on the AMD-Xilinx Virtex-5, the MicroBlaze on the AMD-Xilinx Kintex UltraScale, and the LEON-RTG4 on Microchip Technology’s RTG4, and over existing hardcore processors, such as the Frontgrade GR712RC [8].

### 2.3 Versal Adaptive SoC

The AMD-Xilinx Versal Adaptive SoC is a heterogeneous compute platform that combines multiple novel and next-generation architectures and subsystems into one device and is substantially different from previous-generation architectures. The device architecture, illustrated in Figure 4, features a dual-core ARM Cortex-A72 APU, a dual-core ARM Cortex-R5F real-time processing unit (RPU), Adaptable Engines (PL or FPGA fabric), Intelligent Engines (AIEs and DSPs), and high-speed transceivers and I/O all integrated through a programmable NoC [36]. This heterogeneous architecture provides substantial performance and energy-efficiency advantages over traditional, homogeneous scalar processing elements (CPUs), vector processing elements (e.g., DSPs, GPUs), and PL devices (e.g., FPGAs) [36].

Within the Versal family there are several series of devices that offer a variety of SWaP-C tradeoffs. However there are only two series of devices that offer space-grade versions [37], the AI Core series XQRVC1902 device [38] and the AI Edge series XQRVE2302 device [39]. Both AI Core and AI Edge series devices feature AIE technology. Distinctly, the AI Core series device features AIE tiles, which are optimized for a balance between AI and DSP application workloads, while the AI Edge series device features AIE-ML tiles, which



Figure 3: Comparison between rad-hard hardcore (GR712RC and GR740) and radiation-tolerant softcore (LEON-RTG4, MicroBlaze Virtex-5, and MicroBlaze Kintex UltraScale) processors [8].



Figure 4: Generalized Versal Adaptive SoC AI Core and AI Edge series architecture.

Table 1: Summary of resources for Versal AI Core and AI Edge series devices

| SoC     | Device    | LUTS    | FFs       | BRAM Blocks | URAM Blocks | DSPs  | AIE/AIE-ML Tiles |
|---------|-----------|---------|-----------|-------------|-------------|-------|------------------|
| AI Edge | VE2302    | 150,272 | 300,544   | 155         | 155         | 464   | 34               |
|         | XQRVE2302 | 150,272 | 300,544   | 155         | 155         | 464   | 34               |
| AI Core | VE2802    | 520,704 | 1,041,408 | 600         | 264         | 1,312 | 304              |
|         | VC1902    | 899,840 | 1,799,680 | 967         | 463         | 1,968 | 400              |
|         | XQRVC1902 | 899,840 | 1,799,680 | 967         | 463         | 1,968 | 400              |

are optimized for superior performance in AI application workloads [40]. Table 1 gives the resource utilization for the Versal AI Core and AI Edge series devices. Resource-utilization metrics include FPGA primitives in the PL fabric and AIE/AIE-ML tiles within the AIEs. FPGA primitives include lookup tables (LUTs), flip-flops (FFs), block RAM (BRAM) blocks, UltraRAM (URAM) blocks, and DSP slices. For the Versal, each BRAM block is 36 Kb and each URAM block is 288 Kb.

AMD-Xilinx offers one development board for the AI Core series device and one for the AI Edge series device: the VCK190 featuring the AI Core VC1902 device and the VEK280 featuring the AI Edge VE2802, both shown in Figure 5. The VC1902 is the exact COTS counterpart for the XQRVC1902, but the VE2802 is not the exact COTS counterpart for the XQRVE2302. Even though the VE2802 is not the COTS counterpart for the space-grade AI Edge device, it does provide a useful development environment for drawing general device-level comparisons that is similar to the XQRVE2302, albeit with considerably more AIE-ML tiles and a significantly larger fabric.