

# SHARC: Simulator for Hardware Architecture and Real-time Control

Paul K. Wintz<sup>1</sup> Yasin Sonmez<sup>2</sup> Paul Griffioen<sup>2</sup> Mingsheng Xu<sup>1</sup> Surim Oh<sup>1</sup>  
Heiner Litz<sup>1</sup> Ricardo G. Sanfelice<sup>1</sup> Murat Arcak<sup>2</sup>

<sup>1</sup>University of California, Santa Cruz

<sup>2</sup>University of California, Berkeley

May, 2025

# Motivating Example: Adaptive Cruise Control (ACC)

---



# Motivating Example: Adaptive Cruise Control (ACC)

---



If no computational delays:

⇒ Guaranteed minimum headway

Computational delays depend on

- ▶ Control Algorithm, implementation, and parameters
- ▶ Computational hardware
- ▶ Current state and measurements
- ▶ Recent computations

If computational delays:

⇒ ???

# SHARC: Simulator for Hardware Architecture and Real-time Control



## Features

- ▶ Uses same executable as would be deployed.
- ▶ Parallelized to shorten run times.
- ▶ Easy configuration via JSON files.
- ▶ Dockerized for easy setup.

# Mathematical Model of Delayed Computations



# Controller Execution Simulation

---

To estimate controller run time, we use the [Scarab Microarchitectural Simulator](#).

- ▶ Low level simulation of controller binary on CPU
- ▶ Simulates caching, branch prediction, pipelining, etc.
- ▶ Customizable processor parameters
  - ▶ Cache size
  - ▶ Clock speed
  - ▶ Architecture
- ▶ Provides detailed statistics.

# ACC Example: Instruction Cache Size Comparison

## Problem 1 (Linear MPC)

$$\begin{aligned} \text{minimize} \quad & |\text{velocity error}|^2 \\ & + |\text{control effort}|^2 \end{aligned}$$

subject to

Linear System Dynamics  
Linear Safety Constraints

→ Performance degrades if instruction cache is only 1 KB.



# Example Pseudocode

---

## Physics Dynamics (Python interface)

```
class MyDynamics(Dynamics):
    def evolve_state(self, t0, x0, u, tf):
        return xf # Final state

    def get_output(self, x, u, w):
        return y

    def get_exogenous_input(self, t):
        return w
```

## Controller (C++)

```
class MyController : Controller {
    void calculateControl(double t, Vec &y){
        return u;
    }
};
```

# Configuration

---

```
1  {
2    "Simulation_Options": {
3      "parallel_scarab_simulation": false,
4      "max_batches": 9999999,
5      "max_batch_size": 9999999
6    },
7    "dynamics_module_name": "dynamics.dynamics",
8    "dynamics_class_name": "ACCDynamics",
9    "n_time_steps": 6,
10   "x0": [0, 60.0, 15.0],
11   "u0": [0.0, 100.0],
12   "system_parameters": {
13     "state_dimension": 3,
14     "input_dimension": 2,
15     "exogenous_input_dimension": 2,
16     "output_dimension": 3,
17     "sample_time": 0.2,
18     "mass": 2044,
19     "d_min": 6.0,
20     "v_des": 20,
21     "v_max": 20,
22     "F_accel_max": 4880,
23     "F_brake_max": 6507,
24     "max_brake_acceleration": 3.2,
25     "max_brake_acceleration_front": 5.0912,
26   },
27   "system_parameters": {
28     "mpc_options": {
29       "enable_mpc_warm_start": false,
30       "prediction_horizon": 5,
31       "control_horizon": 5,
32       "output_cost_weight": 10000.0,
33       "input_cost_weight": 0.01,
34       "delta_input_cost_weight": 1.0
35     },
36     "osqp_options": {
37       "abs_tolerance": 1e-5,
38       "rel_tolerance": 1e-5,
39       "dual_infeasibility_tolerance": 1e-3,
40       "primal_infeasibility_tolerance": 1e-3,
41       "maximum_iteration": 5000
42     }
43   },
44   "PARAMS_base_file": "PARAMS.base",
45   "PARAMS_patch_values": {
46     "chip_cycle_time": 60000000,
47     "l1_size": null,
48     "icache_size": null,
49     "dcache_size": null
50   }
51 }
```

# SHARC Parallelization

---

Simulation in Scarab is 10,000x slower than executing directly on the host processor.

- ⇒ Simulating slow controllers on a long time horizon can require several days
- ⇒ We designed a parallelization scheme that allows many time steps to be run in parallel



# SHARC Parallelization

---

Simulation in Scarab is 10,000x slower than executing directly on the host processor.

- ⇒ Simulating slow controllers on a long time horizon can require several days
- ⇒ We designed a parallelization scheme that allows many time steps to be run in parallel



# SHARC Parallelization

---

Simulation in Scarab is 10,000x slower than executing directly on the host processor.

- ⇒ Simulating slow controllers on a long time horizon can require several days
- ⇒ We designed a parallelization scheme that allows many time steps to be run in parallel



# SHARC Parallelization

---

Simulation in Scarab is 10,000x slower than executing directly on the host processor.

- ⇒ Simulating slow controllers on a long time horizon can require several days
- ⇒ We designed a parallelization scheme that allows many time steps to be run in parallel



# SHARC Parallelization

---

Simulation in Scarab is 10,000x slower than executing directly on the host processor.

- ⇒ Simulating slow controllers on a long time horizon can require several days
- ⇒ We designed a parallelization scheme that allows many time steps to be run in parallel



# SHARC Parallelization

Simulation in Scarab is 10,000x slower than executing directly on the host processor.

- ⇒ Simulating slow controllers on a long time horizon can require several days
- ⇒ We designed a parallelization scheme that allows many time steps to be run in parallel



# SHARC Parallelization

Simulation in Scarab is 10,000x slower than executing directly on the host processor.

- ⇒ Simulating slow controllers on a long time horizon can require several days
- ⇒ We designed a parallelization scheme that allows many time steps to be run in parallel



# SHARC Parallelization

Simulation in Scarab is 10,000x slower than executing directly on the host processor.

- ⇒ Simulating slow controllers on a long time horizon can require several days
- ⇒ We designed a parallelization scheme that allows many time steps to be run in parallel



# SHARC Parallelization

Simulation in Scarab is 10,000x slower than executing directly on the host processor.

- Simulating slow controllers on a long time horizon can require several days
  - We designed a parallelization scheme that allows many time steps to be run in parallel



# SHARC Parallelization

Simulation in Scarab is 10,000x slower than executing directly on the host processor.

- ⇒ Simulating slow controllers on a long time horizon can require several days
- ⇒ We designed a parallelization scheme that allows many time steps to be run in parallel



# SHARC Parallelization

Simulation in Scarab is 10,000x slower than executing directly on the host processor.

- ⇒ Simulating slow controllers on a long time horizon can require several days
- ⇒ We designed a parallelization scheme that allows many time steps to be run in parallel



# SHARC Parallelization

Simulation in Scarab is 10,000x slower than executing directly on the host processor.

- ⇒ Simulating slow controllers on a long time horizon can require several days
- ⇒ We designed a parallelization scheme that allows many time steps to be run in parallel



# Comparison: Serial vs. Parallel



## Simulation Time

- ▶ Serial: 1 hour, 20 minutes
- ▶ Parallel: 40 minutes

Fidelity loss due to discarding memory effects between time steps.

# Example: Nonlinear Inverted Pendulum Example



## Problem 2 (Nonlinear MPC)

minimize     $|\text{angle error}|^2$   
              +  $|\text{control effort}|^2$   
subject to    Nonlinear Dynamics



# Conclusion

---

## Future Work

- ▶ Expand systems simulated in SHARC.
- ▶ Generate models of computation time conditioned on state, controller parameters, and hardware configuration.
- ▶ Use models of computation time to accelerate parallelization.
- ▶ Use SHARC to establish guarantees on system performance.
- ▶ Use SHARC for co-design of hardware and controllers by joint optimization.

# Questions?

Slides and paper available at  
[paulwintz.com/publications](http://paulwintz.com/publications).

## Funding



Code at [github.com/pwintz/sharc](https://github.com/pwintz/sharc)

# Linear MPC Problem Formulation for ACC Example

---

minimize

$$J(x_{(\cdot)|k_0}, u_{(\cdot)|k_0}) := \sum_{k=k_0}^{k_0+N_p} (v_{k|k_0} - v_{\text{des}})^2 + \sum_{k=k_0}^{k_0+N_p-1} u_{k|k_0}^\top R u_{k|k_0} + \alpha \sum_{k=k_0}^{k_0+N_p-2} |u_{k+1|k_0} - u_{k|k_0}|^2$$

with respect to

$$x_{k_0|k_0}, x_{(k_0+1)|k_0}, \dots, x_{(k_0+N_p)|k_0} \in \mathbb{R}^2, \quad u_{k_0|k_0}, u_{(k_0+1)|k_0}, \dots, u_{(k_0+N_p-1)|k_0} \in \mathbb{R}^2$$

subject to

$$x_{k_0|k_0} = \hat{x}_{k_0},$$

and for each  $k = k_0, k_0 + 1, \dots, k_0 + N_p - 1$ ,

$$x_{k+1|k_0} = A(\hat{v}_0)x_{k|k_0} + B(\hat{v}_0)u_{k|k_0} + B_d(\hat{v}_0)\hat{w}(k|k_0),$$

and for each  $k = k_0, k_0 + 1, \dots, k_0 + N_p$ ,

$$0 \leq v_{k|k_0} \leq v_{\max}, \quad 0 \leq u_{k|k_0}^{\mathbf{a}} \leq u_{\max}^{\mathbf{a}}, \quad 0 \leq u_{k|k_0}^{\mathbf{b}} \leq u_{\max}^{\mathbf{b}}, \quad h_{\min} \leq h_{k|k_0},$$

and for  $k = k_0 + N_p$ ,

$$h_{k|k_0} \geq (v_{\max}/2|a|)v_{k|k_0} - \hat{v}_{\text{F}}^2(k|k_0)/2|a_{\text{F}}| + h_{\min}.$$

# Nonlinear MPC Problem Formulation

---

minimize

$$J(x_{(\cdot)|k_0}, u_{(\cdot)|k_0}) := \sum_{k=k_0}^{k_0+N_c-1} C(x_{k|k_0}, u_{k|k_0}) + \sum_{k=k_0+N_c}^{k_0+N_p-1} C(x_{k|k_0}, u_{(k_0+N_c-1)|k_0})$$

with respect to

$$x_{k_0|k_0}, x_{(k_0+1)|k_0}, \dots, x_{(k_0+N_p)|k_0} \in \mathbb{R}^{n_x}, \quad u_{k_0|k_0}, u_{(k_0+1)|k_0}, \dots, u_{(k_0+N_c-1)|k_0} \in \mathbb{R}^{n_u}$$

subject to

$$x_{k_0|k_0} = \hat{x}_{k_0},$$

and for each  $k = k_0, k_0 + 1, \dots, k_0 + N_p - 1$ ,

$$x_{k+1|k_0} = f(x_{k|k_0}, u_{k|k_0}),$$

and for each  $k = k_0, k_0 + 1, \dots, k_0 + N_p$ ,

$$\ell_i(x_{k|k_0}, y_{k|k_0}, u_{k|k_0}) \leq 0, \quad \ell_e(x_{k|k_0}, u_{k|k_0}) = 0.$$