



The RTOS  
Exploit Mitigation  
Blues

Jos Wetzels

HARDWEAR.IO



# Jos Wetzels



Midnight Blue

Independent  
Security Researcher

UNIVERSITY  
OF TWENTE.

(Former)  
Security Researcher

ICS

Critical Infrastructure

IoT

Embedded BinSec

Automotive

HIDS / NIDS

Medical

Access Controls

Networking & Firewalls



@s4mvartaka



[www.midnightbluelabs.com](http://www.midnightbluelabs.com)  
[samvartaka.github.io](http://samvartaka.github.io)



## ROADMAP



## What this talk is not about



What this talk is also not about



## What this talk *is* about



# Real-Time Operating System (RTOS)

- Guarantee Response within Time Constraints  
Important in safety-critical sys.
- Predictability & Determinism > Speed
- Soft RT → Usefulness of result degrades after deadline
- Hard RT → Missing deadline = system failure





## Library / Unikernel OS

- **OS as collection of libraries**  
Linked with application code into single address space image
- **No loader, Syscalls implemented as function calls**  
Saves context-switching

## Rising Embedded RTOS Usage

Past, Present, and Expected Primary Operating System, Segmented by Primary OS Source  
(Percent of Respondents)



\*Note: Interim data from VDC's 2016 Embedded Engineer Survey.



## Increasing Connectivity

- Protocol Stacks Everywhere

Talk to gateway, other nodes, smartphones, integration with cloud, etc.

## Increasing Complexity

| # Processors | Bare Metal | RTOS | Linux |
|--------------|------------|------|-------|
| 1            | 33%        | 18%  | 13%   |
| 2-3          | 22%        | 24%  | 17%   |
| 4+           | 11%        | 25%  | 32%   |

- More processors, memory, peripherals
- Simple bare metal control loops & state machines not enough
- Hence rise of RTOS

A photograph of two individuals, a man and a woman, working at a computer keyboard in a dimly lit environment. The man is in the foreground, looking down at the keyboard. The woman is standing behind him, also focused on the screen. The scene has a blue-tinted, cinematic feel.

# Embedded Binary Security



Memory  
Corruption is  
a Big Deal™  
in Embedded



# **Heightened Impact**

- **Patching Issues**
  - Fragmented Responsibility
  - Lack of Secure Update
  - Availability / Safety
- **Long Exposure Windows**
  - 'Forever Days'
- **No Privilege Separation**
  - Flat Memory Model
- **Exploits Scale Well**
  - Single protocol driver bug can affect billions of devices

## PRIMARY EMBEDDED PROGRAMMING LANGUAGE



## Unsafe Languages Are Here To Stay

- Ideally use safe languages
- But unsafe continues to dominate

Despite old (Ada) and upcoming (Rust) embedded-suited langs



The background image depicts a detailed architectural rendering of a medieval castle. It features a massive outer wall with several cylindrical towers and a central, more complex section with a rounded tower and a smaller building attached. The castle is built from light-colored stone blocks and has a series of walkways and battlements along its perimeter. A central courtyard is visible, enclosed by the castle's walls.

# Exploit Mitigations

# General Purpose Exploitation Has Been Getting Harder



VS



2010

2016

## What About Embedded?





Right...

# RTOS Exploit Examples

| Name                         | Target                             | Type                         | RTOS               | Mitigations           |
|------------------------------|------------------------------------|------------------------------|--------------------|-----------------------|
| BroadPWN /<br>CVE-2017- 9417 | Broadcom WiFi                      | Heap Buffer Overflow         | VxWorks            | <b>NO</b>             |
| CVE-2017-0561                | Broadcom WiFi                      | Heap Buffer Overflow         | VxWorks            | <b>NO</b>             |
| VxWorks DHCP / DNS           | VxWorks DHCP / DNS Client & Server | Heap / Stack Buffer Overflow | VxWorks            | <b>NO</b>             |
| CVE-2015-7599                | VxWorks RPC                        | Integer Overflow             | VxWorks            | <b>NO</b>             |
| CVE-2010-3832                | Intel/Comneon Baseband             | Heap Buffer Overflow         | ThreadX / Nucleus+ | <b>NO<sup>1</sup></b> |
| Qualcomm UMTS AUTN           | Qualcomm Baseband                  | Stack Buffer Overflow        | REX                | <b>NO<sup>2</sup></b> |

<sup>1</sup> XM6180 on iPhone4 has DEP, see '*All Your Baseband Are Belong To Us*'

<sup>2</sup> Newer Qualcomm chips run BLAST RTOS w. stack cookies / DEP / kernel-user separation, see '*Baseband Exploitation in 2013*'

# Quantitative Analysis



# Minimum Mitigation Baseline



- **ESP / DEP / NX / W<sup>X</sup>**  
Non-exec. data memory
- **ASLR**  
Address Space Layout Randomization
- **Stack Canaries / SSP**  
Stack buffer overflow protection



## Operating System Selection

- Selected 45 Popular Embedded Oses
- High-end, Low-end, Linux/Windows/BSD-based, proprietary, etc.
- Evaluated Support For Mitigation Baseline

Optimistic Assesment

# Mitigation Support

— ESP — ASLR — Stack Canaries



# What's Going On Here?



# Hardware & Software Dependencies



# Software Dependency Support

Memory Protection    Virtual Memory    OS CSPRNG





## Hardware Selection

- Selected 78 Popular Embedded 'Core Families'
- Evaluated for Hardware Dependency Support

# Hardware Dependency Support (VNA)

■ MPU ■ MMU ■ Hardware ESP



A photograph of a person from behind, wearing a green hooded jacket and a dark backpack, standing on a snowy mountain ridge. They are looking towards a range of snow-capped peaks under a blue sky with scattered white clouds.

# Embedded Mitigation Challenges

# **Resource & System Constraints**



- **Limited Resources**  
Overhead influences adoption likelihood
  
- **Availability Requirements**  
Guarantee high system uptime
  
- **Real-Time Conflicts**  
Issues with eg. virtual memory caching, page faults, etc.



## Advanced Processor Features

- Modern mitigations increasingly use advanced processor features eg. SGX, MPX, CET, LBR, PMU
- Only for modern high-end CPUs
- Embedded HW diversity -> OS cannot assume support

# Embedded Random Number Generators



- Secure Randomness is OS Service!
- Hardware Diversity -> Little Entropy Source Omnipresence
- Low Entropy Environment
  - Little / predictable activity
  - "*Boot-time Entropy Hole*"
  - Can't wait for entropy too long..
- Result: not-so-random NGs, bad workarounds\*

A black and white photograph showing the aftermath of a disaster, likely an earthquake or bombing. In the foreground, there is a large pile of twisted metal and rubble. Behind it, a city skyline is visible, with many multi-story buildings completely collapsed, their frames twisted and broken. The sky is overcast and hazy.

# RTOS Mitigation Examples

## Microkernel



- **Library based RTOS**  
Based on Wind River Rocket
- **OS Linux Foundation Project**  
Aimed at resource-constrained IoT
- **Young (2016) but promising**  
Input from major chipmakers

Explicit focus on security



# Zephyr Stack Canaries

```
349 FUNC_NORETURN void _Cstart(void)
350 {
351 #ifdef CONFIG_ARCH_HAS_CUSTOM_SWAP_TO_MAIN
352     struct k_thread *dummy_thread = NULL;
353 #else
354     struct k_thread dummy_thread_memory;
355     struct k_thread *dummy_thread = &dummy_thread_memory;
356 #endif
357
358     /*
359     * Initialize kernel data structures. This step includes
360     * initializing the interrupt subsystem, which must be performed
361     * before the hardware initialization phase.
362     */
363
364     prepare_multithreading(dummy_thread);
365
366     /* perform basic hardware initialization */
367     _sys_device_do_config_level(_SYS_INIT_LEVEL_PRE_KERNEL_1);
368     _sys_device_do_config_level(_SYS_INIT_LEVEL_PRE_KERNEL_2);
369
370     /* initialize stack canaries */
371 #ifdef CONFIG_STACK_CANARIES
372     __stack_chk_guard = (void *)sys_rand32_get();
373 #endif
374
375     /* display boot banner */
376
377     switch_to_main_thread();
378
379     /*
380     * Compiler can't tell that the above routines won't return and issues
381     * a warning unless we explicitly tell it that control never gets this
382     * far.
383     */
384
385     CODE_UNREACHABLE;
386 }
```

- Based on Clang/GCC SSP
- Canary Failure -> System Error
- One master canary for entire address space
  - Generated once at system boot
  - No refreshing
- Generated using RNG API
  - Implementation depends on chosen *random* driver



## Zephyr Random API

### RANDOM\_HAS\_DRIVER (TRNG)

- **RANDOM\_MCUX\_RNGA**  
NXP Kinetis K64F
- **RANDOM\_MCUX\_TRNG**  
NXP Kinetis KW40Z & KW41Z
- **RANDOM\_STM32\_RNG**  
STM32 Boards
- **RANDOM\_ESP32\_RNG**  
ESP32 Boards  
(requires Wi-Fi & Bluetooth enabled)

# TRNGs Among Zephyr 1.8 Supported Boards

TRNG Support



TRNG Support Per Architecture





## Zephyr Random API **TEST\_RANDOM\_DRIVER (PRNG)**

- **X86\_TSC\_RANDOM\_GENERATOR**  
Uses x86 RDTSC
  - Canary 1st value drawn from PRNG  
little difference between bootruns
  - Infoleaks everywhere
- **TIMER\_RANDOM\_GENERATOR**  
Suffers from similar issues
- **Had contact with Zephyr team,  
plans to integrate OS CSPRNG**  
Most likely NIST SP800-90A based



## Fuchsia OS

- **Under-development Google RTOS**  
Pushed to github w/o official statement in August 2016
- **Seems intended for IoT to smartphones**
- **Based on Magenta microkernel**  
**Derived from LK (Little Kernel)**  
Virtual Memory / MMU support  
Kernel- / Userspace Separation

```

} regions[] = {
{
    .name = "kernel_code",
    .base = (vaddr_t)&__code_start,
    .size = ROUNDUP((size_t)&__code_end - (size_t)&__code_start, PAGE_SIZE),
    .arch_mmu_flags = ARCH_MMU_FLAG_PERM_READ | ARCH_MMU_FLAG_PERM_EXECUTE,
},
{
    .name = "kernel_rodata",
    .base = (vaddr_t)&__rodata_start,
    .size = ROUNDUP((size_t)&__rodata_end - (size_t)&__rodata_start, PAGE_SIZE),
    .arch_mmu_flags = ARCH_MMU_FLAG_PERM_READ,
},
{
    .name = "kernel_data",
    .base = (vaddr_t)&__data_start,
    .size = ROUNDUP((size_t)&__data_end - (size_t)&__data_start, PAGE_SIZE),
    .arch_mmu_flags = ARCH_MMU_FLAG_PERM_READ | ARCH_MMU_FLAG_PERM_WRITE,
},
590 void VmAspace::InitializeAslr() {
591     aslr_enabled_ = is_user() && !cmdline_get_bool("aslr.disable", false);
592
593     crypto::GlobalPRNG::GetInstance()->Draw(aslr_seed_, sizeof(aslr_seed_));
594     aslr_prng_.AddEntropy(aslr_seed_, sizeof(aslr_seed_));
595 }
596
597 uintptr_t choose_stack_guard(void) {
598     uintptr_t guard;
599     if (hw_rng_get_entropy(&guard, sizeof(guard), true) != sizeof(guard)) {
600         // We can't get a random value, so use a randomish value.
601         guard = 0xdeadbeef00ff00ffUL ^ (uintptr_t)&guard;
602     }
603     return guard;
}

```

# Fuchsia OS Exploit Mitigations

- ESP / DEP

x86 MMU-based DEP policies

- ASLR

All executables PIE by default

Randomization draws on OS CSPRNG

- Stack Canaries (GCC SSP Model)

Single early-boot global canary

Canary generated using HW RNG API  
Static value fallback

Canary failure -> kernel panic



# Academic Mitigation Examples



\* MAVR: *Code Reuse Stealthy Attacks and Mitigation on Unmanned Aerial Vehicles* – Habibi et al. (2015)

\*\* AVRAND: *A Software-Based Defense Against Code Reuse Attacks for AVR Embedded Devices* – Pastrana et al. (2016)

## MAVR\* & AVRAND\*\*

- Boot-Time Code Diversification
- MAVR: Ardupilot UAV system
- AVRAND: AVR chip in Arduino Yun
- **MAVR requires hardware mod**  
Additional master processor + flash chip
- **(Re)Randomize by re-flashing**  
Every N restarts or after crash
- **Attacker can reduce device lifespan by forcing re-flash to exhaust program / erase cycles**

A close-up photograph of a person's hand holding a silver pen, writing in a white notebook with blue horizontal grid lines. In the background, another page of the notebook shows a list of items with blue checkboxes next to them, such as "Drive" and "M2".

# Deeply Embedded Exploit Mitigation Criteria



# Embedded Exploit Mitigation Criteria

- **Low Resource Pressure**  
Max 5% Overhead\*
- **Hardware Agnostic**
- **Availability Preservation**
- **'Real-Time Friendly'**
- **Easy RTOS Integration**

\* Microsoft BlueHat Prize Rules (2012),  
'SoK: Eternal War in Memory' – Szekeres et al. (2013)



# Embedded OS CSPRNG Criteria

- **Low Resource Pressure**  
Use lightweight crypto
- **Entropy Gathering Limitations**  
Rapidly available upon boot  
No constant runtime polling
- **Non-Domain Specific Entropy Sources**  
No reliance on sensor values,  
accelerometers, etc.



**μArmor**



# µArmor

- **Deeply Embedded Mitigation Baseline**  
µESP + µScramble + µSSP + µRNG
- **Target Systems**  
Harvard or VNA CPU w. HW ESP  
Single-address space (RT)OS or bare metal
- **Attacker Model**  
Remote (eg. WiFi, Bluetooth, RF) control-flow hijacking attacks

# Representative Platform



# **µESP**

| <b>Memory</b>    | <b>Permissions</b> |
|------------------|--------------------|
| Code (sensitive) | RO+XN              |
| Code (other)     | RO+X               |
| Data             | RW+XN              |
| Peripherals      | RW+XN              |
| SCB Config       | RO+XN              |
| MPU Config       | RO+XN              |

Table 14: µESP Memory Permission Policies

- **Use MPU to Enforce ESP**
- **Stack Grows Away From Data**
- **Limit Code-Reuse Surface**  
Prevent ret2bootloader style attacks
- **'Lock Down' MPU & SCB**  
Prevent MPU disabling

# **μScramble**



- Need Alternative for ASLR
- Software Diversification
  - When: Compilation, Boot, Runtime, Etc.
  - What: Code, Data
- Hybrid Compile/Update Solution
  - Integrated in firmware generation
  - Per-device diff firmware image
- Benefits
  - Limited Overhead
  - No virtual memory required
  - Works on source: platform independent

# **µScramble** Code Diversification



- **Code Transformations**  
Function Reordering  
Dead Code Insertion  
Register Preservation Reordering
- **Fine-Grained Randomization**  
Wrt. image base  
Wrt. function base  
Wrt. inter-function distance  
Order of register preservation code
- **Decorrelation reduces infoleak & bruteforce impact**

# **µScramble** Transformations



# **µSSP**



- **Based on Zephyr SSP**
- **Master canary drawn from OS CSPRNG**
- **Modular violation handler**
  - Passive
  - Fatal
- Thread Terminate / Restart
- System Terminate / Restart



Initial Entropy

Absorb



Pseudo-Random  
Bits

## μRNG

- **Keccak-based OS CSPRNG**  
128-bit security

25 byte internal state

1 algorithm for entropy gather & RNG

Integrated as Zephyr random driver

- **Avoid Reseed Polling**  
Reseed Control linked to PRNG output

# $\mu$ RNG Initial Entropy Source



- **SRAM Startup Values (SUVs)**  
Two stable states  
Converge after initial instability
  
  - **Manufacturing Differences**  
Biased cells -> PUFs  
Random cells -> PRNGs
  
  - **Many Benefits**  
Instantly available at very early boot
- Differs between devices & bootruns

SRAM omnipresent on MCUs



# **µArmor**

## **Overhead & Security Evaluation**

# **µArmor**

## **Overhead Evaluation**

| <b>RNG</b>                 | <b>ROM</b> | <b>RAM</b> |
|----------------------------|------------|------------|
| µRNG <sup>1</sup>          | 709        | 81         |
| TinyRNG [302] <sup>2</sup> | 11086      | 471        |
| TinyKey [428] <sup>3</sup> | 5990       | 790        |

- **Metrics**

Code Size, Data Size, Memory Usage, Runtime Overhead

- **Evaluated Against**

IoT sample applications

TACLeBench suite

- **Overhead**

Code Size  $\leq 5\%$

Others  $\leq 1\%$

| Set               | Avg. GS <sup>1</sup> | Max. GS <sup>1</sup> |
|-------------------|----------------------|----------------------|
| <b>Sequential</b> |                      |                      |
| adpcm_dec         | 0.5                  | 4                    |
| adpcm_enc         | 0.6                  | 5                    |
| ammunition        | 3.6                  | 147                  |
| anagram           | 0.6                  | 4                    |
| audiobeam         | 4                    | 79                   |
| cjpeg_transupp    | 32                   | 96                   |
| cjpeg_wrbmp       | 1.4                  | 3                    |
| dijkstra          | 1.3                  | 5                    |
| epic              | 0                    | 0                    |
| fref              | 0.5                  | 3                    |
| gsm_dec           | 1.3                  | 43                   |

## µScramble Security Evaluation

- **Coverage Analysis**

1000 firmware variants per benchmark

Harvest ROP gadgets

Max avg. gadget survival rate: 10.15%

- **Complicating Attacker Factors**

Payload length

Firmware complexity

Fine-Grained Randomization

# $\mu$ RNG Security Evaluation



- Keccak is well-analyzed
- **Passive State Recovery Attacks**  
Reseed at least every 1GB of output  
well within 32GB bound\*
- **No 'Boot Time Entropy Hole'**  
All initial entropy instantly available

# **µRNG**

## **Security Evaluation**

**Table 1.** Min-entropy results for STM32F100R8 SRAMs at different temperatures. Min-entropy denoted as percentage of total available SRAM.

| Temp.<br>[°C] | Microcontroller ID |      |      |      |      |      |      |      |      |      |
|---------------|--------------------|------|------|------|------|------|------|------|------|------|
|               | 1                  | 2    | 3    | 4    | 5    | 6    | 7    | 8    | 9    | 10   |
| -30           | 5.3%               | 5.3% | 5.4% | 5.5% | 5.4% | 5.3% | 5.4% | 5.2% | 5.3% | 5.8% |
| +25           | 6.6%               | 6.6% | 6.7% | 6.8% | 6.7% | 6.5% | 6.8% | 6.5% | 6.7% | 6.7% |
| +90           | 6.3%               | 6.5% | 6.5% | 6.6% | 6.2% | 6.5% | 6.5% | 6.2% | 6.5% | 6.5% |

- **SRAM Suitability Depends on Chip**  
Need Entropy  $\geq 2^*$  security strength

Avg. Entropy: 5% of SRAM size\*  
Worst-case: 1.2% (100 bits) for PIC16F

- **Reality**  
SRAM slightly contaminated  
Harvest in v. early boot & limit stack use

- **Memory Retention**  
Minimal at normal ambient temps  
Clear loss in cold (-20C) for some chips

Full SRAM reset between power cycles

\* Secure PRNG Seeding on Commercial Off-the-Shelf Microcontrollers -  
Van Herrewege et al. (2013)

# **Limitations, Conclusions & Future Work**



# uArmor Limitations

- **Platform**  
Harvard CPU or VNA with hardware ESP
- **Attacker Model**  
Remote Control-Flow Hijacking Only
- **Diversification Relocation Frequency**  
Compile- / Update-Time Only
- **Diversification Coverage**  
Code Memory only



## Conclusions

- **Embedded memory Corruption issues warrant serious attention**  
Smallest of devices are connected  
Unsafe languages are here to stay
- **Current state of embedded mitigations is terrible**  
Esp. in deeply embedded / RTOS  
Not easily fixed either
- **µArmor is intended as small step forward**



# Tock



## Future Work

- **Long Term**

Embedded Safe Language Adoption  
(Rust Embedded, Tock OS, etc.)

Secure & Scalable Patching

- **Short Term**

Mitigation Baseline Adoption in RTOS

Hardware Dependency Support

Advanced Embedded Mitigations  
(CFI, XoM, etc.)

# Questions?

 @s4mvartaka  
 j.wetzels@midnightbluelabs.com

