



# The Power of Open-Source Hardware: a user experience point of view



Politecnico  
di Torino



Davide Schiavone<sup>1,2</sup>, Luigi Giuffrida<sup>3</sup>, Maurizio Martina<sup>3</sup>, Jose Miranda<sup>4</sup>, Andrés Otero<sup>4</sup>, Francesco Conti<sup>5</sup>, Davide Rossi<sup>5</sup>, David Atienza<sup>2</sup>, Frank Gurkaynak<sup>6</sup>, Luca Benini<sup>5,6</sup>

---

1 OpenHW Foundation; 2 EPFL; 3 Polytechnic of Turin; 4 Universidad Politécnica de Madrid; 5 University of Bologna; 6 ETH Zurich

# Intro

# Intro



HARDWARE ACCELERATORS



DIFFERENT APPLICATIONS

# Intro



HARDWARE ACCELERATORS



DIFFERENT APPLICATIONS



An **accelerator** is **useless** by itself



An easily extensible **system** should **host** it

# Intro



# Intro



HARDWARE ACCELERATORS



DIFFERENT APPLICATIONS



Realistic conditions **performance** evaluation



An **accelerator** is **useless** by itself



An easily extensible **system** should **host** it



Open source



Silicon validated



Free to use and **modify**



# Intro



# Intro



# Intro





Politecnico  
di Torino



# Parallel Ultra Low Power (PULP) and OpenHW foundation

Where it all started

# PULP platform

The **Parallel Ultra Low Power (PULP) Platform** started as a joint effort between the **ETH Zürich** and the **University of Bologna** in 2013 to explore and develop new and **efficient computing architectures** leveraging the **RISC-V** open ISA and **open-source, collaborative hardware design**.



# PULP platform

The **Parallel Ultra Low Power (PULP) Platform** started as a joint effort between the **ETH Zürich** and the **University of Bologna** in 2013 to explore and develop new and **efficient computing architectures** leveraging the **RISC-V** open ISA and **open-source, collaborative hardware design**.



**PULPv1** - 4 OpenRISC cores  
**Closed source** explorative chip



# PULP platform

The **Parallel Ultra Low Power (PULP) Platform** started as a joint effort between the **ETH Zürich** and the **University of Bologna** in 2013 to explore and develop new and **efficient computing architectures** leveraging the **RISC-V open ISA and open-source, collaborative hardware design**.



12 years of building a world-wide community with universities and companies

**PULPv1** - 4 OpenRISC cores  
**Closed source** explorative chip



# PULP platform

The **Parallel Ultra Low Power (PULP) Platform** started as a joint effort between the **ETH Zürich** and the **University of Bologna** in 2013 to explore and develop new and **efficient computing architectures** leveraging the **RISC-V** open ISA and **open-source, collaborative hardware design**.



12 years of building a world-wide community with universities and companies



**PULPv1** - 4 OpenRISC cores  
**Closed source** explorative chip

**Occamy** - 432 + 2 RISC-V cores  
Almost completely **open source**  
2.5D integration multi-chiplet system



# PULP platform

The **PULP team** today is actively maintaining **308 repositories** on **github**.

All the hardware is released under the permissive **Solderpad License** and the software under the **Apache 2.0 License**.

The **open source IPs** maintained by **PULP team** represent a **solid pool of high quality IPs**.

Most of the IPs and platforms are **silicon validated**.

Some of them are today **verified at industrial level**.



# OpenHW Foundation

The **OpenHW community** brings together **hardware and software designers** to create **open source core designs and related IP, tools, and software** with the aim for **commercial and academic usage alike**. These collaborative **designs are adopted and customized by industry leaders** to produce cores for a wide range of **applications**, spanning from **small embedded devices** like cellphones to advanced **high-performance computing systems**.



# OpenHW Foundation



**CVE2**  
tiny core



**CV32E40P**  
DSP core



**CV32E40X**  
extensible  
core



**CVA6**  
application class  
processor



OPENHW®

# OpenHW Foundation



**CVE2**  
tiny core



**CV32E40P**  
DSP core



**CV32E40X**  
extensible  
core



**CVA6**  
application class  
processor

**Open Bus Interface  
standard**  
OBI

**Core-V eXtension interface**  
CV-X-IF



# CV32E40P graduation

The **first milestone** was the **CV32E40P core, verified** at the **industrial level**, achieving **100% code coverage** while unveiling **47 bugs**.

This achievement marked the so-called “**graduation**” of the **CV32E40P** core, which transitioned **from a research project to an industrial product**, free to use and with a permissive license.





Politecnico  
di Torino



# X-HEEP system

Let's not reinvent the wheel!

# X-HEEP

X-HEEP is a **modular RISC-V**  
**32 bit microcontroller**  
template:

- **Highly configurable**
- **Easily extensible**
- **Open source**



# X-HEEP



# X-HEEP



Core's specific features can be enabled at system level

CORE-V extension interface

Floating point unit

PULP extensions



# X-HEEP



The **number of memory banks and their sizes** can be configured. As the possibility to have **data interleaved** or **contiguous**.

These information is directly propagated to the **linker script** for **seamless software integration**.



# X-HEEP



The **number of DMA channels** can be configured. The **number of master ports** on the system bus can be configured independently on the number of channels (**more channels can share the same master port**).



# X-HEEP



Included **peripherals** can be **configured** depending on the application requirements and the PPA target.

Many peripherals can be selected.



# X-HEEP asics

Not only X-HEEP is based on well verified and silicon proven designs, it is **silicon proven** itself, different **tapeouts** have been completed successfully in academia.

Being based on open source hardware, Embedded Systems Laboratory @ EPFL was able to **go from no system to silicon in 5 months** and **successfully test** their **custom accelerators** relying on a working platform (X-HEEP).

HEEPocrates



HEEPnosis





Politecnico  
di Torino



# X-HEEP experience

Let's not reinvent the car!

# Extending X-HEEP



**X-HEEP**, being easily extensible and configurable, is a good **host system** to accommodate the execution of tasks on **tightly coupled accelerators**.

In fact the designer interested in using X-HEEP has just to **instantiate it with their design and connect them**.



# X-TRELA

- STREAMing ELastic CGRA
- High throughput while low power
- Dataflow computation
- Public release on [Github](#)



# X-TRELA - CHIP

**4.7 mm<sup>2</sup>**

@ TSMC 65nm LP CMOS - Q1 2025

## Single-core

- CV32e40P

## Accelerator

- 4x4 STRELA CGRA

## Memory

- NtM bus
- 256 kB SRAM (4 interleaved banks)



# Presenting **ARCANE**



Fully configurable

Doubles as D\$ and matrix coproc

Easy programming experience

## ARCANE stats:

- Load and store **data**:
- Fetch **instructions**:
- **Data management**:



## Open-source resources:



# Cryptography



The **CRASH team** works on:

- **PQC-accelerator** integration.
- **LWC-accelerator** integration.
- **Side-channel Analysis.**

Different method of integration analyzed



Many PQC-algorithm supported:  
CRYSTALS-Kyber, CRYSTALS-Dilithium,  
SPHINCS+, FALCON, HQC



Open source nature of RISC-V enables  
microarchitectural analysis and to  
develop new tailored countermeasure



# MAGE



Decoupled address generation and data computation

AGEs tailored for 4D affine access to internal SRAM

PEs perform packet SIMD operations



# RACE



Programmable coprocessor, supports runtime definition of ISA extensions

Exploits the OpenHW extension interface to communicate with the core

Get the best of CGRA computing capability and CPU flexibility



# Published papers

- Petrolo, V., Guella, F., Caon, M., Schiavone, P. D., Masera, G., & Martina, M. (2025). ARCANE: Adaptive RISC-V Cache Architecture for Near-memory Extensions. *arXiv preprint arXiv:2504.02533*.
- Caon, M., Petrolo, V., Mirigaldi, M., Guella, F., Masera, G., & Martina, M. (2024, May). Seeing Beyond the Order: A LEN5 to Sharpen Edge Microprocessors with Dynamic Scheduling. In *Proceedings of the 21st ACM International Conference on Computing Frontiers: Workshops and Special Sessions* (pp. 47-50).
- Alessandra Dolmeta, Emanuele Valpreda, Maurizio Martina, and Guido Masera. 2024. Implementation and integration of NTT/INTT accelerator on RISC-V for CRYSTALS-Kyber. In Proceedings of the 21st ACM International Conference on Computing Frontiers: Workshops and Special Sessions (CF '24 Companion). Association for Computing Machinery, New York, NY, USA, 59–62. <https://doi.org/10.1145/3637543.3652872>
- A. Dolmeta, M. Martina and G. Masera, "ATHOS: A Hybrid Accelerator for PQC CRYSTALS-Algorithms exploiting new CV-X-IF Interface," in IEEE Access, doi: 10.1109/ACCESS.2024.3511340.
- Piscopo, V.; Dolmeta, A.; Mirigaldi, M.; Martina, M.; Masera, G. A High-Entropy True Random Number Generator with Keccak Conditioning for FPGA. *Sensors* 2025, 25, 1678. <https://doi.org/10.3390/s25061678>
- Dolmeta, A., Martina, M., Masera, G. (2025). Exploring the New CV-X-IF Interface to Customize RISC-V Instruction Sets: A Case of Study in Cryptography. In: Ruo Roch, M., Bellotti, F., Berta, R., Martina, M., Motto Ros, P. (eds) Applications in Electronics Pervading Industry, Environment and Society. ApplePies 2024. Lecture Notes in Electrical Engineering, vol 1369. Springer, Cham. [https://doi.org/10.1007/978-3-031-84100-2\\_5](https://doi.org/10.1007/978-3-031-84100-2_5)
- Mirigaldi, M., Martina, M., Masera, G. (2025). Performance Comparison: Software vs. Hardware Implementation of Novel S-Box Designed to Resist Power Analysis Attack. In: Ruo Roch, M., Bellotti, F., Berta, R., Martina, M., Motto Ros, P. (eds) Applications in Electronics Pervading Industry, Environment and Society. ApplePies 2024. Lecture Notes in Electrical Engineering, vol 1369. Springer, Cham. [https://doi.org/10.1007/978-3-031-84100-2\\_3](https://doi.org/10.1007/978-3-031-84100-2_3)
- Alessandro Varaldi, Alessio Naclerio, Fabrizio Riente, Maurizio Zamboni, Mariagrazia Graziano, Marco Vacca. Optimizing TCN Inference: A Hardware-Software Co-Design Approach with CGRA Acceleration, to appear at ISVLSI 2025
- Alessio Naclerio, Fabrizio Riente, Giovanna Turvani, Marco Vacca, Maurizio Zamboni, Mariagrazia Graziano. Mage: A Decoupled Access-Execute CGRA Tailored for Static Control Applications, appeared at ISCAS 2025
- Luigi Giuffrida, Guido Masera and Maurizio Martina. TOXOS: Spinning Up Nonlinearity in AI with a RISC-V CORDIC Coprocessor, under review



# Conclusions

# Conclusions



The availability of **high quality** and **verified open source hardware** enabled the possibility to build **X-HEEP**.

# Conclusions



The availability of **high quality** and **verified open source hardware** enabled the possibility to build **X-HEEP**.

Such a platform **enabled** the possibility to unlock **new research possibilities** in many research groups.

This kind of virtuous experience is letting many **universities** to **join the silicon path** lowering the barrier to it.



# The Power of Open-Source Hardware: a user experience point of view

Thank you for your attention!



Politecnico  
di Torino



# RISC-V Profiler



**riscv-function-profiling**

- ✗ `asm volatile("csrr %0, mcycle");`
- ✓ All functions get profiled
- ✓ Needs just **.vcd + .elf + .wal** (config.)



# Conclusions

In conclusion, this presentation emphasizes the significant impact of open-source hardware, enabling industries to reduce costs and allowing academic research projects to proceed more rapidly and efficiently. It highlights the real case of the PULP project's IPs, which not only allowed ETH Zurich and the University of Bologna to conduct their own experiments but also enabled OpenHW Foundation members to build systems based on the most popular PULP IPs, contributing back their industrial-grade version. This supported EPFL in developing an extendable microcontroller that further lowered access to RISC-V open-source SoCs, swiftly demonstrating the maturity of open-source IPs by prototyping an SoC in less than 5 months. This also assisted new users, such as the Polytechnics of Turin and Madrid, in pursuing their research goals by allowing them to create new accelerators and heterogeneous SoCs, with one eventually resulting in a silicon prototype, making silicon fabrication more accessible.

# Intro

What do we need to conduct effective research on hardware accelerators?  
Reliable systems to integrate our accelerators.

What do we need to build useful and reliable systems? Working and verified IPs.

How can we access to valuable IPs? Accessing to open source components and relying on foundations like OpenHW.

Why do we need systems to integrate our accelerators? It's easier to showcase their capabilities when they are used in realistic conditions. Moreover, if a system is hosting the accelerator, it is easier to test it in a silicon prototype. How to I load the scratchpad memory of an accelerator on silicon? I can but it's difficult, writing C code is easier and more effective.