

# NEPP Electronics Technology Workshop

## June 15-18, 2020



*This work was funded in part by the NASA Electronic Parts and Packaging (NEPP) Program and the Trusted & Assured Microelectronics Program Under Interagency Agreement SAA5-18-4-U28631*

# Single Event Effects in Field Programmable Gate Array (FPGA) Devices: Update 2020



**Melanie Berg<sup>(1)</sup>**

**Melanie.D.Berg@NASA.gov; Melanie.Berg@SSAIHQ.com**

**Michael Campola<sup>(2)</sup>, Hak Kim<sup>(1)</sup>, Anthony Phan<sup>(1)</sup>**

- 1. SSAI Inc. in support of the NEPP Program and NASA/GSFC**
- 2. NASA Goddard Space Flight Center**



# Acronyms

| Acronym   | Definition                                                                                  |
|-----------|---------------------------------------------------------------------------------------------|
| 1MB       | 1 Megabit                                                                                   |
| 3D        | Three Dimensional                                                                           |
| 3DIC      | Three Dimensional Integrated Circuits                                                       |
| ACE       | Absolute Contacting Encoder                                                                 |
| AHB       | Advanced high performance bus                                                               |
| ADC       | Analog to Digital Converter                                                                 |
| AEC       | Automotive Electronics Council                                                              |
| AES       | Advanced Encryption Standard                                                                |
| AMD       | Advanced Micro Devices Incorporated                                                         |
| AMS       | Agile Mixed Signal                                                                          |
| ARM       | Acorn Reduced Instruction Set Computer Machine                                              |
| AXI       | Advanced extensible interface                                                               |
| BGA       | Ball Grid Array                                                                             |
| BRAM      | Block Random Access Memory                                                                  |
| BTMR      | Block triple modular redundancy                                                             |
| CAN       | Controller Area Network                                                                     |
| CBRAM     | Conductive Bridging Random Access Memory                                                    |
| CCC       | RTG4 clock conditioning circuit                                                             |
| CCI       | Correct Coding Initiative                                                                   |
| CGA       | Column Grid Array                                                                           |
| CMOS      | Complementary Metal Oxide Semiconductor                                                     |
| CN        | Xilinx ceramic flip-chip (CF and CN) packages are ceramic column grid array (CCGA) packages |
| COTS      | Commercial Off The Shelf                                                                    |
| CRC       | Cyclic Redundancy Check                                                                     |
| CRÉME     | Cosmic Ray Effects on Micro Electronics                                                     |
| CRÉME MC  | Cosmic Ray Effects on Micro Electronics Monte Carlo                                         |
| CSE       | Crypto Security Engineer                                                                    |
| CU        | Control Unit                                                                                |
| DC        | Direct current                                                                              |
| DCU       | Distributed Control Unit                                                                    |
| DDR       | Double Data Rate (DDR3 = Generation 3; DDR4 = Generation 4)                                 |
| DFF       | Flip-flop                                                                                   |
| DMM       | Digital Multimeter                                                                          |
| DMA       | Direct Memory Access                                                                        |
| DSP       | Digital Signal Processing                                                                   |
| DSPI      | Dynamic Signal Processing Instrument                                                        |
| DTMR      | Distributed triple modular redundancy                                                       |
| Dual Ch.  | Dual Channel                                                                                |
| DUT       | Device under test                                                                           |
| ECC       | Error-Correcting Code                                                                       |
| EDAC      | Error detection and correction                                                              |
| EEE       | Electrical, Electronic, and Electromechanical                                               |
| EMAC      | Equipment Monitor And Control                                                               |
| EMIB      | Multi-die Interconnect Bridge                                                               |
| EPICS     | Extended physical coding layer                                                              |
| ESA       | European Space Agency                                                                       |
| eTimers   | Event Timers                                                                                |
| ETW       | Electronics Technology Workshop                                                             |
| FCCU      | Fluidized Catalytic Cracking Unit                                                           |
| FeRAM     | Ferroelectric Random Access Memory                                                          |
| FinFET    | Fin Field Effect Transistor                                                                 |
| FIR       | Finite impulse response filter                                                              |
| FMC       | FPGA Mezzanine Card                                                                         |
| FPGA      | Field Programmable Gate Array                                                               |
| FPU       | Floating Point Unit                                                                         |
| FY        | Fiscal Year                                                                                 |
| Gb        | Gigabit                                                                                     |
| Gbps      | Gigabit per second                                                                          |
| GCR       | Galactic Cosmic Ray                                                                         |
| GEO       | geostationary equatorial orbit                                                              |
| GIC       | Global Industry Classification                                                              |
| GOMACTech | Government Microcircuit Applications and Critical Technology Conference                     |
| GPIO      | General purpose input/output                                                                |
| GPIB      | General purpose interface bus                                                               |
| GPU       | Graphics Processing Unit                                                                    |
| GR        | Global Route                                                                                |
| GRC       | NASA Glenn Research Center                                                                  |
| GSFC      | Goddard Space Flight Center                                                                 |

| Acronym     | Definition                                                                                                |
|-------------|-----------------------------------------------------------------------------------------------------------|
| GTH/GTY/GTX | Transceiver Type                                                                                          |
| GTMR        | Global TMR                                                                                                |
| HALT        | Highly Accelerated Life Test                                                                              |
| HAST        | Highly Accelerated Stress Test                                                                            |
| HBM         | High Bandwidth Memory                                                                                     |
| HDIO        | High Density Digital Input/Output                                                                         |
| HDR         | High-Dynamic-Range                                                                                        |
| HIREV       | High Reliability Virtual Electronics Center                                                               |
| HKMG        | high-k metal gate                                                                                         |
| HMC         | Hybrid Memory Cube                                                                                        |
| HPIO        | High Performance Input/Output                                                                             |
| HPS         | High Pressure Sodium                                                                                      |
| HSTL        | High speed transceiver logic                                                                              |
| I/F         | interface                                                                                                 |
| I/O         | input/output                                                                                              |
| I2C         | Inter-Integrated Circuit                                                                                  |
| i2MOS       | Microsemi second generation of Rad-Hard MOSFET                                                            |
| IC          | Integrated Circuit                                                                                        |
| I-Cache     | Independent cache                                                                                         |
| JFAC        | Joint Federated Assurance Center                                                                          |
| JPEG        | Joint Photographic Experts Group                                                                          |
| JTAG        | Joint Test Action Group (FPGAs use JTAG to provide access to their programming debug/emulation functions) |
| KB          | Kilobyte                                                                                                  |
| L2 Cache    | independent caches organized as a hierarchy (L1, L2, etc.)                                                |
| LCDT        | NEPP low cost digital tester                                                                              |
| LEO         | Low Earth Orbit                                                                                           |
| LET         | Linear energy transfer                                                                                    |
| L-mem       | Long-Memory                                                                                               |
| LP          | Low Power                                                                                                 |
| LUT         | Look-up table                                                                                             |
| LVCmos      | Low-voltage Complementary Metal Oxide Semiconductor                                                       |
| LVDS        | Low-Voltage Differential Signaling                                                                        |
| LVTTL       | Low-voltage transistor-transistor logic                                                                   |
| LTMR        | Local triple modular redundancy                                                                           |
| LW HPS      | Lightwatt High Pressure Sodium                                                                            |
| M/L BIST    | Memory/Logic Built-In Self-Test                                                                           |
| Mil-STD     | Military standard                                                                                         |
| MAPLD       | Military Aerospace Programmable Logic Device                                                              |
| MFTF        | Mean fluence to failure                                                                                   |
| μPROM       | Micro programmable read-only memory                                                                       |
| μSRAM       | Micro SRAM                                                                                                |
| Mil/Aero    | Military/Aerospace                                                                                        |
| MIPi        | Mobile Industry Processor Interface                                                                       |
| MMC         | MultiMediaCard                                                                                            |
| MOSFET      | Metal-Oxide-Semiconductor Field-Effect Transistor                                                         |
| MP          | Microprocessor                                                                                            |
| MP          | Multiport                                                                                                 |
| MPFE        | Multiport Front-End                                                                                       |
| MPSoC       | Multiprocessor System on a chip                                                                           |
| MPU         | Microprocessor Unit                                                                                       |
| Msg         | message                                                                                                   |
| MTTF        | Mean time to failure                                                                                      |
| NAND        | Negated AND or NOT AND                                                                                    |
| NASA        | National Aeronautics and Space Administration                                                             |
| NEPP        | NASA Electronic Parts and Packaging                                                                       |
| NOR         | Not OR logic gate                                                                                         |
| NV(M)       | Non-volatile (memory)                                                                                     |
| OCM         | On-chip RAM                                                                                               |
| OSC-TMR-PLL | Embedded triple modular redundant phase locked loop                                                       |
| OSC         | Oscillator                                                                                                |
| OSD         | Office of the Secretary of Defense                                                                        |
| PC          | Personal Computer                                                                                         |
| PCB         | Printed Circuit Board                                                                                     |

| Acronym           | Definition                                                |
|-------------------|-----------------------------------------------------------|
| PCIe              | Peripheral Component Interconnect Express                 |
| PCIe Gen2         | Peripheral Component Interconnect Express Generation 2    |
| Pconfiguration    | SEU cross-section of configuration                        |
| Pfunctional_logic | SEU cross-section of functional logic                     |
| PHY               | Physical layer                                            |
| PLL               | Phase Locked Loop                                         |
| PLOL              | Phase Locked Loop loss of lock                            |
| PMA               | Physical Medium Attachment                                |
| POR               | Power on reset                                            |
| PPM               | Parts per million                                         |
| Proc.             | Processing                                                |
| PS-GTR            | High Speed Bus Interface                                  |
| PSEFI             | SEU cross-section from single event functional interrupts |
| Psystem           | System SEU cross-section                                  |
| QDR               | quad data rate                                            |
| QFN               | Quad Flat Pack No Lead                                    |
| QML               | Qualified manufactures list                               |
| QSPI              | Serial Quad Input/Output                                  |
| RC                | Resistor capacitor                                        |
| R&M               | Reliability and Maintainability                           |
| RAM               | Random Access Memory                                      |
| ReRAM             | Resistive Random Access Memory                            |
| RGB               | Red, Green, and Blue                                      |
| RH                | Radiation Hardened                                        |
| RT                | Radiation Tolerant                                        |
| RTD               | Representative tactical design                            |
| RTG4FCCC_0        | RTG4 Phase lock loop Core                                 |
| SATA              | Serial Advanced Technology Attachment                     |
| SCU               | Secondary Control Unit                                    |
| SD                | Secure Digital                                            |
| SD/eMMC           | Secure Digital embedded MultiMediaCard                    |
| SD-HC             | Secure Digital High Capacity                              |
| SDM               | Spatial-Division-Multiplexing                             |
| SEE               | Single Event Effect                                       |
| SEF               | Single event failure                                      |
| SEFI              | Single Event Functional Interrupt                         |
| SEL               | Single event latchup                                      |
| SERDES            | Serializer/deserializer                                   |
| SET               | Single event transient                                    |
| SEU               | Single event upset                                        |
| Si                | Silicon                                                   |
| SK Hynix          | SK Hynix Semiconductor Company                            |
| SMDs              | Selected Item Descriptions                                |
| SMMU              | System Memory Management Unit                             |
| SOA               | Safe Operating Area                                       |
| SOC               | Systems on a Chip                                         |
| SPI               | Serial Peripheral Interface                               |
| sRIO              | Serio Rapid I/O                                           |
| SSTL              | Sub series terminated logic                               |
| TBD               | To Be Determined                                          |
| Temp              | Temperature                                               |
| THD+N             | Total Harmonic Distortion Plus Noise                      |
| TMR               | Triple Modular Redundancy                                 |
| T-Sensor          | Temperature-Sensor                                        |
| TSMC              | Taiwan Semiconductor Manufacturing Company                |
| UART              | Universal Asynchronous Receiver/Transmitter               |
| UltraRAM          | Ultra Random Access Memory                                |
| USB               | Universal Serial Bus                                      |
| VNAND             | Vertical NAND                                             |
| WDT               | Watchdog Timer                                            |
| WSR               | Windowed shift register                                   |
| XAUl              | Extended 10 Gigabit Media Independent Interface           |
| XGXS              | 10 Gigabit Ethernet Extended Sublayer                     |
| XGMII             | 10 Gigabit Media Independent Interface)                   |

To be presented by Melanie D. Berg at the NASA Electronic Parts and Packaging Program (NEPP) Electronics Technology Workshop (ETW), NASA Goddard Space Flight Center in Greenbelt, MD, June 15-18, 2020 and published on nepp.nasa.gov.

# Agenda



- **FPGA and SEE Test Methodology Overview**
- **Xilinx Kintex-UltraScale SEE Test and Analysis**
- **Microsemi PolarFire SEE Test and Analysis**
- **SEE Data Analysis Methodology (SRAM-based FPGA)**



# FPGA SEU Cross-section Model

BRAM: Block random access memory

SEU: single event upset

SEF: single event failure (system)

$\sigma$ : cross-section



Cross-sections for a mapped design/system ( $\sigma_{SEF}$ ) are a function of the FPGA's internal elements and the mapped design's topology.

$$\sigma_{SEF} = f(\sigma_{configuration}, \sigma_{BRAM}, \sigma_{functionalLogic}, \sigma_{HiddenLogic})$$

Dominant mechanisms of failure will drive  $\sigma_{SEF}$



# SEF and Dominant Mechanisms of Failure

- Distinction must be made between SEU/SEF test methodologies, functional testing, and reliability/TID studies.
- The mechanisms of failure, their impact, and metrics differ:
  - **SEU/SEF:** Upon random-event particle ionization...how often does something happen; mean-time-to-failure; mean-fluence-to-failure; probabilities; statistics. Flat portion of reliability-bathtub curve.
  - **Functional:** Based on a potential design flaw... Does the system operate as expected? No correlation to how long it takes to find a failure or how often – the importance is to find any failure.
  - **Reliability/TID:** degradation... right-side rising portion of bathtub curve.
- **SEF cross-sections will depend on the FPGA type and the user-mapped design's dominant mechanisms of failure. Yet some studies tend to focus on mechanisms that have negligible impact.**
- SEE dominant mechanisms of failure drive the following:
  - Test methodology (test fixture, stimulus, monitors, and capture)
  - Data results (cross-sections) ... no need to concentrate on items that have negligible contributions.
  - Error rate/ Survivability prediction



# SRAM-based FPGA Single Event Effects (SEE) Study: Xilinx Kintex-Ultrascale (XCKU040-1LFFVA1156I)





# Xilinx Kintex-UltraScale Study Objectives

*SEU: single event upset*

$\sigma_{SEU}$ : SEU Cross-section

*SEFI: single event functional interrupt*

*DUT: device under test*

*SEL: single event latch-up*

*SET: single event transient*

- This is an independent investigation that evaluates the single event destructive and transient susceptibility of the Xilinx Kintex-UltraScale device.
- Design/Device susceptibility is determined by monitoring the DUT for SET and SEU induced faults by exposing the DUT to a heavy ion beam.
- Potential SEL is checked throughout heavy-ion testing by monitoring device current.
- FPGA part# **XCKU040-1LFFVA1156I**.

$$\sigma_{SEF} = f(\sigma_{\substack{\text{configuration} \\ \text{system } \sigma_{SEU}}}, \sigma_{\substack{\text{BRAM,} \\ \text{Block RAM} \\ \sigma_{SEU}}}, \sigma_{\substack{\text{functional logic} \\ \text{Functional logic} \\ \sigma_{SEU}}}, \sigma_{\substack{\text{Hidden Logic} \\ \text{SEFI} \\ \sigma_{SEU}}})$$

**NEPP performs independently driven studies to determine various device/system susceptibilities as they pertain to NASA programs.**



# Collaboration and Test Campaigns

This study is divided in two phases (if any, additional phases will be community driven/funded):

- **Phase I: Generic component study:**
  - Collaboration: NEPP, Xilinx, and Space R<sup>2</sup> LLC
  - Tests performed: 11/2019 LBNL
  - Additional Data: gathered from a prior NEPP Kintex-UltraScale test campaign 03/2017 TAMU
  - Completed: test report submitted (December 2019)
- **Phase II: Advanced component/system study:**
  - Collaboration: NEPP, Xilinx, Aerospace, and Space R<sup>2</sup> LLC
  - New structures/tasks:
    - Scrubbing (32-bit 50 MHz)
    - Xilinx Microblaze processor
    - Multi-transceiver (GTX) lanes
    - Triple modular redundancy (TMR)
  - Will begin shortly after government opening.



# Impact to Community: Kintex-UltraScale

COTS: commercial off the shelf

- Entry into the aerospace market with COTS expectation (KU060)\*.
- Fabricated on a high-k metal gate (HKMG) TSMC 20 nm planar HPL (high performance low power) process.
- I/O interfaces are robust and meet the space community's needs.
- Previous studies show no SEL.
- There are no embedded mitigation circuits in the user fabric. However, higher gate-count affords the user to insert mitigation.
- There is no embedded processor. However, the user can embed a soft-core. *Data Transfer Is Key for Our New System Applications: Kintex-UltraScale Transceivers (GTH and GTY ... GTX)*

| Type              | GTH      | GTY      |
|-------------------|----------|----------|
| Quantity          | 16-64    | 0-32     |
| Maximum Data Rate | 16.3Gb/s | 16.3Gb/s |
| Minimum Data Rate | 0.5Gb/s  | 0.5Gb/s  |

\*Actual designated device (by Xilinx) is the KU060. KU040 was the device under test (DUT) for this investigation. Both devices are from the same Xilinx product family (same process) and have the same geometry (20 nm). It is agreed upon and understood by the SEE community that data obtained by one device applies to the other.

# DUT Preparation for Heavy-Ion SEE Testing



- NEPP populated three custom-made daughter boards with XCKU040-1LFFVA1156I (DUT) devices.
- The DUTs were thinned using mechanical etching via an Ultra Tec ASAP-1 device preparation system.
- The parts were successfully thinned to 90 um – 100 um.

*Ultra Tec ASAP-1*



*NEPP custom developed daughter card*



# Test System: LCDT and DUT (KU040)



LCDT: low cost digital tester

GUI: Graphical User Interface

CLK, CLK\_SR\_A, SHFT\_CLK: clocks

RS232, TX232: universal asynchronous receiver-transmitter (UART)

LabVIEW GUI: Send  
Commands and Receive Data

LCDT3: NEPP custom  
developed Motherboard tester



LabVIEW GUI: Monitor  
power and configure  
tester

Logic Analyzer and  
configure DUT



# Heavy-Ion Test Facilities and Test Conditions



- **Flux:**  $1.0 \times 10^2$  to  $1.0 \times 10^5$  particles/cm<sup>2</sup>/s
- **Fluence:** All tests were run to  $1 \times 10^7$  particles/cm<sup>2</sup> or until destructive or functional events occurred.
- **Test Temperature:** Room Temperature.

## Lawrence Berkeley National Laboratory (LBNL)

| Ion | Energy<br>(MeV/Nucleon) | Effective<br>LET(MeV·cm <sup>2</sup> /mg)0° |
|-----|-------------------------|---------------------------------------------|
| N   | 16                      | 1.16                                        |
| O   | 16                      | 1.54                                        |
| Si  | 16                      | 2.39                                        |
| Si  | 16                      | 4.35                                        |
| Ar  | 16                      | 7.27                                        |
| V   | 16                      | 10.9                                        |

## Texas A&M (TAMU)

| Ion | Energy<br>(MeV/Nucleon) | LET<br>(MeV*cm <sup>2</sup> /mg) 0° | LET<br>(MeV*cm <sup>2</sup> /mg) 60 ° |
|-----|-------------------------|-------------------------------------|---------------------------------------|
| He  | 25                      | 0.07                                | 0.14                                  |
| N   | 25                      | 0.9                                 | 0.18                                  |
| Ne  | 25                      | 1.8                                 | 3.6                                   |
| Ar  | 25                      | 5.5                                 | 11.0                                  |
| Kr  | 25                      | 19.8                                | 40.0                                  |

# Summary: Phase I DUT Test Structures



## Generic Component Study

| Test Structure           | Frequency Range |
|--------------------------|-----------------|
| Configuration            | N/A             |
| BRAM                     | 50 MHz          |
| Shift Registers (WSR)    | 100 MHz         |
| Counter Arrays           | 50 MHz          |
| DSP Blocks (FIR)         | 100 MHz         |
| GTX (Aurora single lane) | 3.125 GHz       |

# Xilinx Kintex-UltraScale Configuration and BRAM SEU Data



Configuration SEU cross-sections Across Device



Configuration and BRAM SEU cross-sections per bit



Note1: TAMU and LBNL data correlate.

Note2: Graphs have different scales.

Note 3: Left graph: across device... right graph: normalized per bit.

**Additional Kintex data will be shown in a following section.**



# SONOS FPGA Single Event Effects (SEE) Study: Microsemi PolarFire® (MPF300TS-1FCG1152I)





# Microsemi PolarFire Study Objectives

- This is an independent investigation that evaluates the single event destructive and transient susceptibility of the Microsemi PolarFire FPGA device.
- Design/Device susceptibility is determined by monitoring the DUT for Single Event Transient (SET) and Single Event Upset (SEU) induced faults by exposing the DUT to a heavy ion beam.
- Potential Single Event Latch-up (SEL) is checked throughout heavy-ion testing by monitoring device current.
- FPGA part# MPF300TS-1FCG1152I.

$$\sigma_{SEF} = f(\sigma_{configuration}, \sigma_{BRAM}, \sigma_{functionalLogic}, \sigma_{HiddenLogic})$$

*Configuration*      *Block RAM*      *Functional logic*      *SEFI*  
 $\sigma_{SEU}$                    $\sigma_{SEU}$                    $\sigma_{SEU}$                    $\sigma_{SEU}$

$\sigma_{SEU}$        $\sigma_{SEU}$

**SONOS configuration is not expected to have bit flips. However, pass/fail configuration readbacks were performed after each experiment.**



# Collaboration and Test Campaigns

This study is divided in multiple phases:

- **Phase I: Generic component study**
  - Collaboration: NEPP, Microsemi, Trusted & Assured Microelectronics Program
  - Tests performed: 11/2019 LBNL
  - Completed and test report submitted (December 2019)
- **Phase II: Fill out SEE cross-sections**
  - Collaboration: NEPP, Microsemi, Trusted & Assured Microelectronics Program
  - Same test structures as Phase I (generic components)
- **Phase III: TBD**
  - Collaboration: NEPP, Microsemi, and ???
  - New structures/tasks: TBD

# Impact to Community: Microsemi PolarFire ®



- SONOS non-volatile (NV) technology on a 28 nm technology node. Innately hardened configuration.
- Reconfigurable FPGA with SEU immune configuration.
- User fabric logic (flip-flops, combinatorial logic, global routes) are not hardened. However, increase in logic gates allows for user inserted mitigation (e.g., TMR and watchdogs).
- Cost advantage over SRAM-based FPGAs and previous generation Microsemi FPGAs using floating gate NV technology (65nm and older).
- Trust related embedded structures:
  - Physically unclonable function (PUF)
  - Secure eNVM ® (non-volatile memory security feature)
  - Tamper detectors and counter measures
- Up to 24 multi-protocol low power serial I/O: 250Mbps – 12.5 Gbps Transceiver lanes

# DUT Preparation for Heavy-Ion SEE Testing



- NEPP acquired two evaluation-boards (MPF300-EVAL-KIT) populated with MPF300TS-1FCG1152I PolarFire® devices.
- The DUTs were thinned using mechanical etching via an Ultra Tec ASAP-1 device preparation system.
- The parts were successfully thinned to roughly 100 um.



**NEPP use of an evaluation board as a daughterboard instead of developing custom daughter card.**

# Test Setup: New Motherboard Tester



**Flexible FPGA  
Mezzanine Card  
(FMC)**

NEPP is now using evaluation boards as motherboards (testers). LCDT replacement

**Motherboard:  
development of  
ethernet capability**

**Motherboard**

**Daughterboard**



# Test System: At Heavy-Ion Facility



# Summary: Phase I DUT Test Structures



## Generic Component Study

| Test Structure        | Frequency Range |
|-----------------------|-----------------|
| Configuration         | N/A             |
| BRAM                  | 50 MHz          |
| Shift Registers (WSR) | 100 MHz         |
| Counter Arrays        | 50 MHz          |
| DSP Blocks (FIR)      | 100 MHz         |

# Heavy-Ion Test Facility and Test Conditions



- **Facility:** Lawrence Berkeley National Laboratories 88 inch Cyclotron, 16 MeV/amu tune.
- **Flux:**  $1.0 \times 10^3$  to  $1.0 \times 10^5$  particles/cm<sup>2</sup>/s
- **Fluence:** All tests were run to  $1 \times 10^7$  particles/cm<sup>2</sup> or until destructive or functional events occurred.
- **Test Temperature:** Room Temperature.
- **Power Supply Voltage:**  $V_{cc} = 1.2V$ ;  $V_{IO} = 2.5V$

We lost a significant amount of test time because of California wild-fires.

| Ion | Energy (MeV/Nucleon) | Linear energy transfer (LET)             |
|-----|----------------------|------------------------------------------|
|     |                      | Effective LET(MeV·cm <sup>2</sup> /mg)0° |
| N   | 16                   | 1.16                                     |
| O   | 16                   | 1.54                                     |
| Ne  | 16                   | 2.39                                     |

# Current-Drop Anomaly



- Every experiment (if run with enough particle fluence) experienced a current drop; however all but one (1) test had a current drop lasting for 1.7 ms.
- Shown: drop lasted for 177s – cleared on its own. Only observed during one test at an LET = 1.0 MeVcm<sup>2</sup>/mg.
- Most current measurement systems are not setup to detect a 1.7 ms drop. We were able to catch the event due to the various means of active/real-time data capture during test.

# PolarFire Current-Drop (Timeout\*) SEU Cross-Sections



\* $1.7 \text{ ms}$  current-drop event could not be observed via normal current measurement apparatus. However, the current event could be observed by DUT-operation timeouts.

Data across designs correlate ... Events are not design dependent;  
Mechanism of failure is embedded in device.



# Current Drop Anomaly: Additional Information

- Normal operational current was marked at approximately 2.75 A; The core-current dropped below 100 mA during anomalous event.
- The current drop was always recoverable.
- The current drop lasted for approximately  $1.7\text{ ms}$  except for one event which lasted for approximately 177s.
  - Note that the event shown on the previous slide is not the  $1.7\text{ ms}$  event; alternatively it is the 177 s event.
  - Difference in current-drop duration is generally in the order of microseconds.
- The current drop is significant enough to stop operation (timeout).
- A reset is required after the current drop (state-space is lost during the event).
- No configuration is lost after a current drop (read-back passes with no SEUs).
- The current drop occurred for every test at every LET (that was used during the first-look study).
- **Lower LET values are required to achieve a more accurate reliability/survivability calculation per environment.**
- **Microsemi is aware of the anomaly and is working to identify responsible circuitry.**



# Microsemi PolarFire ® SEU Data



**Interesting note: per bit  
SEU cross-sections do  
not seem to be design  
dependent.**

# Microsemi PolarFire ® Additional SEE Testing



- Lower LET experiments are necessary in order to characterize the current-drop onset and to predict error-rates.
  - Requires TAMU heavy-ion tests (LETs can go as low as 0.07 MeVcm<sup>2</sup>/mg).
- Higher LET experiments are necessary in order to fill out the SEU cross-section curve; and to find saturation.
- NEPP will investigate:
  - More complex embedded components
  - Test-as-you-fly (representative tactical designs (RTD)).



# Data Handling and Survivability/Error Rate Prediction Techniques

At the end of the day... the professional industry gathers SEE data for SEF and survivability/error-rate prediction.

What do we do with all of this data?

# Survivability for Mission Critical Applications: Problem Statement



For SEF analysis, common practice is to use simple test structures that focus on discrete components:

- Data are extrapolated into survivability calculators.
- Generic SEU data are used across all designs.
- Assumption: the need for testing is reduced.
- However, the fidelity of generic SEU data extrapolation to tactical designs is questionable.



Better to use representative tactical designs (RTD) for SEU analysis:

- Data are a better fit for characterizing tactical behavior.
- However, requires SEU testing for every design!

**How do we provide SEU data for survivability calculations of tactical systems; while reducing the need to test every design? Generic testing versus Test-As-You-Fly.**

# NEPP FPGA Device Investigations: Generic SEE Data versus RTD SEE Data



- Data presented in earlier slides are component level/generic.
- NEPP will always perform a component level investigation on FPGAs:
  - First look
  - Flush out
  - General idea if mitigation will be required
  - Important information for the community
- As FPGA devices become more complex extrapolation from simple component structures to RTD is not an appropriate method for tactical characterization.
- NEPP does perform test-as-you-fly (RTD) FPGA SEE investigations for programs (program-specific experiments).

# Embedded View of Mapped Logic



FPGA configuration and user logic are different types of embedded components.



**Modern FPGAs have 100's of millions of configuration bits and 100's of thousands of logic cells.**



**Configuration**



**User Logic LUT**

Designs only map into a portion of the configuration and only use a portion of the user fabric logic gates.

# Why Extrapolation Does not work with Generic Test Structures: Example Shift Register



User logic: Lookup Table (LUT)



User logic: Flip-Flop(DFF)

LUTs and DFFs are contained in configuration logic blocks (CLBs)

With an SRAM-based FPGA, each design uses more logic than assumed. Makes extrapolation of SEU data (from simple test structures to tactical designs) unreliable.



Generic Xilinx Implementation  
(LUT can differ by family)



# Closer Look: Shift Register with Manufacturer Inserted Routing Matrix (Hidden Logic)



**Hidden Logic:** Routing matrix inserted during place and route phase. Adds to the overall design susceptibility.

Simple test structures will not capture the impact of a tactical design's hidden logic (data are not extrapolatable). Hence the drive towards testing RTD structures.

# Representative Tactical Design (RTD) Test Structures and MFTF Test Strategies



- RTDs are based on tactical designs and might contain the following:
  - Embedded processors
  - Highspeed serial (GTX)
  - Embedded SRAM (BRAM)
  - Global routes
- Mean fluence to failure (MFTF): record fluence that failure occurs.
- RTDs must obey tactical design strategies:
  - Synchronous design
  - Routing/floorplanning specifics
- Piecemeal RTD tests, yet use complex structures:
  - Increases visibility
  - Study trends
  - Have at least one full RTD (close as possible to tactical)
- RTD/MFTF testing requires an increase in the number of experiments (statistics); and will be driven by dominant mechanisms of failure.



# Data Analysis: Easing the process of SEU test and analysis for tactical-design survivability prediction.

The following slides only apply to Xilinx  
SRAM-based FPGA devices with no  
embedded or user inserted mitigation.



# Configuration, Mask, and Essential Bits

Design mapping into user fabric logic cells is defined by configuration bit settings.



- Configuration bits: Total number of configuration cells... (fixed per each FPGA type)
  - Masked bits: calculated by the manufacturer and is not under user control... design and device dependent
  - Unmasked bits
- Essential bits: number of configuration cells used by the design mapping (calculated by the manufacturer upon user directive... design and device dependent).





# SEU Cross-Sections



- Cross-section Categorization:
  - Across all configuration cells (device)
  - Per configuration cell (device-bit)
  - Across essential-bits (Design + device)
  - Design specific

Generally, configuration cross-sections are readily available from generic device investigations.

$$\sigma(LET)_{configuration\_Device} = \frac{\#errors}{\#Particles/cm^2}$$

$$\sigma(LET)_{configuration\_bit} = \frac{\#errors}{\left(\frac{\#Particles}{cm^2}\right) * (\#unmaskedconfigurationBits)}$$

$$\sigma(LET)_{Essential\_bit} = Essential\_bits \times \sigma(LET)_{configuration\_bit}$$

$$\sigma(LET)_{SEF} = 1/MFTF = 1/((FailureTime - BeamStartTime) * AverageFlux)$$

Which cross-sections do we use for survivability analysis?  
Must consider mission requirements.



# Mission Driven Data Analysis

- Assuming configuration SEU cross-sections are strict upper-bounds:  
Does the survivability prediction using the configuration SEU cross-sections per device satisfy mission requirements?
  - **Can I stop here?** If mission requirements are satisfied, then readily available configuration SEU cross-sections can be used.
  - Additional testing might be required to investigate device anomalies.
- Assuming essential-bit SEU cross-sections are strict upper-bounds:  
Will the essential bit SEU cross-sections satisfy mission requirements?
  - In most cases, this will still be a strict upper-bound of a design's SEU susceptibility... however ... should test to verify the assumption.
  - Requires configuration read-back tests.
  - Requires RTD-MFTF testing.
- If MFTF SEU results are not mission compliant, is mitigation necessary?

# If Upper-bounds Satisfy Mission Reliability/Survivability Requirements, Then No Mitigation is Required.



# Xilinx SEU Test and Analysis: What Can the Manufacturer Provide?



## Front-end Proof of Concept

$$\sigma(\text{LET})_{\text{Essential\_bit}} = \text{Essential\_bits} \times \sigma(\text{LET})_{\text{configuration\_bit}}$$

- Goal is to determine if generic data can be extrapolated to characterize complex tactical designs.
- Providing DFF, CLB, and LUT generic test data is not extrapolatable.
  - Topology effects are non-linear and does not include hidden logic.
- An alternative is to prove  $\sigma(\text{LET})_{\text{Essential\_bit}}$  is an upper-bound to  $\sigma(\text{LET})_{\text{SEF}}$ .



Manufacturer performs a variety of tests (benchmarks) to compare  $\sigma(\text{LET})_{\text{Essential\_bit}}$  to  $\sigma(\text{LET})_{\text{SEF}}$ .



Manufacturer provides generic data: configuration, BRAM, and embedded logic cross-sections.



Manufacturer performs additional testing to investigate potential SEFI's and other device SEE susceptibilities (global routes and SEL).

# Xilinx SEU Test and Analysis: What Does The End-User Do with The Data?



## Application of Concept

Intellectual property (IP)

- If  $\sigma(LET)_{\text{Essential\_bit}}$  proves to be a satisfactory upper-bound, the  $\sigma(LET)_{\text{configuration\_bit}}$  data and the tactical design's calculated essential-bits can be used by development teams for survivability analysis.
- In the past,  $\sigma(LET)_{\text{Essential\_bit}}$  has been assumed (by some) to be adequate for survivability prediction. However, as technology shrinks the need for RTD-MFTF testing and proof of concept is growing:
  - Mixed-signal circuitry, global-routes, and hidden logic (embedded IP cores) will have more impact on  $\sigma(LET)_{\text{SEF}}$  at low LETs.

 Compare your design to manufacturer benchmark designs. Use  $\sigma(LET)_{\text{Essential\_bit}}$  for survivability calculations if  $\sigma(LET)_{\text{Essential\_bit}} > \sigma(LET)_{\text{SEF}}$

 If manufacturer data show anomalies or your tactical design has untested complexities, additional RTD testing will be needed.

 The end-user should not piecemeal small grained components (e.g., CLBs) for survivability analysis because of hidden logic and topological non-linearities.

# Kintex-UltraScale SEU Cross-Sections



$$\sigma_{\text{essential\_bit}} > \sigma_{\text{SEF}}$$

Implies  $\sigma_{\text{Essential\_bit}}$  can be used to predict survivability (non-mitigated design).

More testing will be performed to investigate if there are SEFI's and if upper-bound holds across complex designs (e.g., embedded processors); and higher LET.



# Mitigation Analysis

- If the survivability analysis proves the design implementation does not satisfy mission requirements, user-inserted mitigation might be necessary.
  - This will change the design and its essential-bit count.
  - Essential-bit upper-bounds cannot be used to measure the survivability of applications with embedded mitigation.
    - Mitigation requires additional logic
    - Additional logic will increase the essential-bit count and consequently increase the estimated  $\sigma_{SEF}$ .
  - RTD-MFTF testing is required to measure the efficacy of the inserted mitigation. Can't assume mitigation performs as expected.
  - Requires the development team to perform SEU testing.
- Should analyze the design with-mitigation and without-mitigation (when possible)... used as another metric for the fidelity of the inserted mitigation.



# Summary: Data Handling and Survivability/Error Rate Prediction Techniques



- Purpose of the work is to improve SEU data-sets used for survivability analysis.
- Generic SEU data obtained from testing simple structures (e.g., shift registers) are no longer adequate for SEU characterization of FPGA designs.
- An approach is presented that combines investigating simple and complex test structures:
  - Investigates the efficacy of using configuration SEU data with design specific information for survivability analysis.
  - Goal is to reduce the necessity of performing SEU testing on every design.
  - MFTF testing of complex structures is required to validate the approach (per SRAM-based FPGA family of devices).
- Xilinx Kintex-UltraScale data are presented:
  - Data suggest that essential-bit SEU cross-section might be a reliable data-set for survivability analysis.
  - Additional testing by Xilinx is required and will be performed... yet initial results are promising.
  - Eventually, this approach can reduce the need for testing by the end-user.
- If mitigation is required,  $\sigma(LET)_{SEF}$  RTD-MFTF testing is required to be performed/orchestrated by the end-user.



# NEPP Future Work

## SEE in FPGA Devices





# Potentially In the Works...

- Investigation of Lattice 28 nm CrossLink-NX (FD-SOI) SRAM-based FPGA
  - Proton
  - TID
- Further SEE investigation of 28 nm NV-based PolarFire ®
  - Proton
  - Heavy-ion
  - Test-as-you-fly
- Xilinx SRAM-based MPSoC 16nm FinFET ruggedized (and non-ruggedized) package
  - Proton
  - Heavy-ion
  - Test-as-you-fly (NASA-specific)
- Intel SRAM-based Stratix-10 SoC 14 nm FinFET
  - Proton
  - Heavy-ion
  - Test-as-you-fly



# Thank You Questions?

**This work was funded in part by the NASA Electronic Parts and Packaging (NEPP) Program and the Trusted & Assured Microelectronics Program Under Interagency Agreement SAA5-18-4-U28631**