

# RISC-V Heterogeneous Programming Paradigm

## Atomic IO Enqueue (AOE) Extension & AOE with Virtualization

Author: Guo Ren, Alibaba Damo Academy  
Email: guoren.gr@alibaba-inc.com

### Abstract

In the era of artificial intelligence, a single GPU architecture can no longer meet the demands of diverse intelligent computing workloads. Consequently, heterogeneous computing has emerged as the mainstream approach. Integrating different Domain-Specific Accelerators (DSAs) into computing systems could enhance overall computational efficiency. For instance, high-dimensional tensor computing tasks can be offloaded to TPUs/NPUs/GPUs, while data stream processing tasks are delegated to DPU. The challenge of efficiently managing DSAs in heterogeneous systems has become a prominent industry focus, driving several technological advancements:

- PCI-e 5.0 and CXL 2.0 have introduced Deferable Memory Write (DMW) TLPs.
- Armv8.7/9.2 has incorporated ST64BVO instructions for 64-byte atomic I/O enqueue operations.
- The x86 architecture has implemented ENQDCQ instructions with comparable functionality.

These innovations collectively reduce control latency and optimize system resource utilization.

To help RISC-V adapt to this trend, the presentation introduces the Atomic IO Enqueue Extension (AOE) and its usage with virtualization. The AOE extension is designed for RV64 ISA, which includes one PMA definition, two U-mode instructions, two S-mode instructions, a single S-mode CSR, and two enqdcq control bits. The presentation also introduces how to use AOE under the virtualization scenario, which involves the new proposal for RISC-V IDMMU: G-stage table in Process Context (GIPC). With the help of AOE and GIPC, RISC-V could explore a new heterogeneous programming paradigm from HPC to embedded scenarios.



### ① Atomic IO Enqueue (AOE) Extension

"Atomic IO Enqueue" (AOE) extension is designed for the RV64 ISA, which contains one PMA definition, two user instructions, two supervisor instructions, one single S-mode CSR, and two enqdcq control bits:

|                   |                                                                           |
|-------------------|---------------------------------------------------------------------------|
| AOE PMA           | Atomic IO Enqueue Physical Memory Attribute                               |
| CSR_SEUHQ         | Supervisor Read Write CSR for UENQ instructions                           |
| UENQ_64B          | User Enqueue Instruction for 64-byte                                      |
| UENQ_32B          | User Enqueue Instruction for 32-byte [Optional]                           |
| SENQ_64B          | Supervisor Enqueue Instruction for 64-byte                                |
| SENQ_32B          | Supervisor Enqueue Instruction for 32-byte [Optional]                     |
| CSR_MEVNCFG_SUEHQ | Control bit for SENQ & CSR_SEUHQ in VS/V5-mode                            |
| CSR_MEVNFG_SUEHQ  | Control bit for SENQ & CSR_SEUHQ in VS-mode<br>(AOE under Virtualization) |

#### ② UENQ\_64B (64-byte Atomic IO Store)

Unprivileged Enqueue (UENQ) instruction of the atomic IO single-wide 64-byte with/without the status result. The 64-byte store data is formed as data [51:25:32] < CSR\_SEUHQ > [31:0] from 8 consecutive registers.

#### ③ CSR\_SEUHQ (for Process\_ID)

Privileged CSR register that replaces the lowest bits of the store data of UENQ\_64B as Process\_ID.

#### ④ AOE PMA (for Security)

Atomic IO Enqueue (AOE) Physical Memory Attributes (PMA) defines SENQ and UENQ target address attribute.



- CSR\_MEVNCFG\_SUEHQ = 0
- The vCMMU maintains a G-stage table
- VM domains are distinguished by Process\_ID, G-stage tables
- Process\_ID distinguishes VMs



However, the current RISC-V IDMMU distinguishes VM domains by Device\_ID.



### G-stage table in Process Context (GIPC) Extension



- G-stage table in Process Context (GIPC) Extension:
  - Process\_ID distinguishes VMs
  - Process DR table is based on PA and managed by the VMM
  - No GPA->VA TLB entries

