

# OpenPOWER ISA

H. Peter Hofstee, Ph.D.  
IBM & TU Delft

---

First things first ...

---

If you are interested in the OpenPOWER architecture ...  
first thing to do is join the OpenPOWER foundation!

Free for associate/academic members

<https://openpowerfoundation.org/>

# Early History

- Telephone switch ( from 1974 )
- Project '801' ( from 1975 )
- 'Cheetah' ( from 1982 )
- 'America' ( from 1985 )
- RS/6000 ( from 1986 )

## Abbreviated Lineage of the Power ISA

- Greater than 30 years of innovation and a developed ecosystem
- Instruction heritage shown for **Power ISA 3.1**

| Instruction Heritage       | Note     | # Instr. | Cum Instr. | Open ISA          |
|----------------------------|----------|----------|------------|-------------------|
| POWER (P1)                 | Base     | 218      | 218        | Contributing      |
| POWER (P2)                 |          | 6        | 224        | Contributing      |
| PowerPC (P3)               | 64b      | 119      | 343        | Contributing      |
| PowerPC 2.00 (P4)          |          | 7        | 350        | Contributing      |
| PowerPC 2.01               |          | 2        | 352        | Contributing      |
| PowerPC 2.02 (P5)          |          | 14       | 366        | Contributing      |
| Power ISA 2.03             | SIMD-VMX | 171      | 537        | Contributing      |
| Power ISA 2.05 (P6)        |          | 105      | 642        | Contributing      |
| Power ISA 2.06 (P7)        | SIMD-VSX | 189      | 831        | Contributing      |
| Power ISA 2.07 (P8)        |          | 111      | 942        | Contributing      |
| <b>Power ISA 3.0 (P9)</b>  |          | 231      | 1173       | <b>Compliance</b> |
| <b>Power ISA 3.1 (P10)</b> | Prefix   | 246      | 1419       | <b>Compliance</b> |



# A rich history – just some highlights

- POWER2 (P2SC + ASIC) - Deep Blue chess champion
- RAD750 (derived from ppc750 iMac G3) - Mars Rover Perseverance
- Playstation3, Xbox360, Nintendo Wii - Games trifecta
- POWER7 - Jeopardy champion
- POWER9 - Summit and Sierra

# The Linux Foundation Announces New Open Hardware Technologies and Collaboration

THE LINUX FOUNDATION | 21 AUGUST 2019

OpenPOWER to become Linux Foundation project to advance development of data-driven architectures required for hybrid cloud environments and increasingly intensive workloads

**SAN DIEGO, CA August 21, 2019** – [The Linux Foundation](#), the nonprofit organization enabling mass innovation through open source, today announced that the OpenPOWER Foundation will become a project hosted at The Linux Foundation. The project includes IBM's open POWER Instruction Set Architecture (ISA) and contributed Source Design Implementations required to support data-driven hardware for intensive workloads like Artificial Intelligence (AI).

OpenPOWER is the open steward for the Power Architecture and has the support of 350 members, including IBM, Google, Inspur Power Systems, Yadro, Hitachi, Wistron, Mellanox, NVIDIA, and Red Hat.

The governance model within the Linux Foundation gives software developers assurance of compatibility while developing AI and hybrid cloud native applications that take advantage of POWER's rich feature set and open compute hardware and software ecosystems.

- **August 20, 2019** – Open ISA Announcement at NA OpenPOWER Summit
- **February 13, 2020** – Final Draft of the End User License Released by OPF:
  - <https://openpowerfoundation.org/final-draft-of-the-power-isa-eula-released/>
- **April 2020** – POWER ISA 3.0c contribution to OPF
  - Same as POWER ISA 3.0b except for
    - Compliancy Subsets
    - Custom Extension Space (Sandbox)
    - SMF Feature
- **May 2020** – POWER ISA 3.1 contribution to OPF ( latest is 3.1c )
- **May 2020** – POWER ISA Workgroup Chartered in OPF



|  | 

About Events Working Groups Members HUB Technical Blog Contact Us

Join

Login 

# Technical Specifications

Search...

Matching items

[64-bit ELF ABI Specification for OpenPOWER Architecture](#)

[Advanced Accelerator Adapter Electro-Mechanical Specification](#)

[Coherent Accelerator Interface Architecture](#)

[Field Replaceable Unit Service Interface Specification](#)

[Instruction Set Architecture](#)

[IO Design Architecture \(IODA\) Specification](#)

[Linux on POWER Architecture Reference](#)

[Memory Bus Specification](#)

[OpenCAPI POWER Platform Architecture Guide](#)

[PSL / AFU Interface Specification](#)

[Vector Intrinsics Porting Guide](#)

[Vector Intrinsics Programming Reference Specification](#)



# Instruction Set Architecture

[Download version 3.1c](#) 2024-05-26

The Power Instruction Set Architecture (ISA) Version is a specification that describes the architecture used by POWER processors.

It defines the instructions the processor executes.

It is comprised of three books and a set of appendices.

- Book I
  - Power ISA User Instruction Set Architecture, covers the base instruction set and related facilities available to the application programmer.
- Book II
  - Power ISA Virtual Environment Architecture, defines the storage model and related instructions and facilities available to the application programmer.
- Book III
  - Power ISA Operating Environment Architecture, defines the supervisor instructions and related facilities.

Owned by : [Instruction Set Architecture TWG](#)

Compliance : [Test Suite & Harness](#)

**version 3.1c** 2024-05-26

*Data cleanup*  
[download](#)

**version 3.1b** 2021-09-14

*Incorporate errata.*  
[download](#)

**version 3.1** 2020-05-02

*Initial release.*  
[download](#)

**version 3.0c** 2020-05-01

*Incorporate errata.*  
[download](#)

**Request For Change Feedback for "non ISA TWG members"**

This feedback process uses the RFC method, which we divide in 3 types : *minor changes, formal proposal, or RFC*.



## *Books and Contents*

**Book I**  
User  
Instruction Set  
Architecture



**Base instruction set and related facilities:**

- Memory reference
- Flow control
- Integer
- Floating Point
- Numeric acceleration

**Book II**  
Virtual  
Environment  
Architecture



**Storage model for thread and resource interaction:**

- Time, synchronization
- Cache management
- Storage features

**Book III**  
Operating  
Environment  
Architecture



**Supervisor instructions and related facilities:**

- Exceptions, interrupts
- Memory management
- Debug facilities
- Special control / resources

## *Environment*

- Application-level programming model
- Generally compiler generated

- Non-privileged library-level programming model
- Generally assembler generated

- Supervisor/Hypervisor programming model

- A supporting / compliant design must:

- Support the **Base** architecture (never optional)
  - ***And*** support at least one of the compliancy sub-sets:
    - **ACS**: AIX Compliancy Subset
    - **LCS**: Linux Compliancy Subset
    - **SFFS**: Scalar Fixed-Point + Floating-Point Compliancy Subset
    - **SFS**: Scalar Fixed-Point Compliancy Subset
  - ***And may*** support any **optional features** for the compliancy subset
    - **All optional features must be supported in their entirety per compliancy specification**
  - ***And may*** support **custom extensions using the architecture sandbox**
  - ***And may*** utilize a combination of hardware and firmware
    - Any firmware must rely on base architecture, the included optional features and the included custom extensions; must honor restrictions specified in the architecture for use of storage; and must honor explicitly prohibited uses.

- Any of the 4 Subsets can be chosen for compliance
- Can also include any of the optional features from the other sets below

▪ Full Arch Compliance



*Refer to official PowerISA document for compliancy subset definition and additional details*

|                                     |                                 |                                   |
|-------------------------------------|---------------------------------|-----------------------------------|
| Copy-paste-accel ( <u>CPA</u> )     | Non-coherent-mem ( <u>M=0</u> ) | Matrix-math-assist ( <u>MMA</u> ) |
| Secure-mem-facility ( <u>SMF</u> )  | Wr-tru-req-mem ( <u>W=1</u> )   |                                   |
| Data-stream-prefetch ( <u>STM</u> ) | Power-management ( <u>PM</u> )  |                                   |

### Always Optional Features <320i>

|                                    |                                |                                   |
|------------------------------------|--------------------------------|-----------------------------------|
| <u>ATL</u> -HAIL-programmability   | <u>EVIRT</u> -programmability  | Quad-prec-float ( <u>QFP</u> )    |
| Atomic-mem-ops ( <u>AMO</u> )      | SLB / <u>HPT</u> xlate         | Decimal-float ( <u>DFP</u> )      |
| Big-endian ( <u>BE</u> )           | Proc-compat-reg ( <u>PCR</u> ) | Load-store-multiple ( <u>LM</u> ) |
| Branch-history-buf ( <u>BHRB</u> ) | Broadcast- <u>TLBIE</u>        | Load-store-string ( <u>LS</u> )   |
| Event-based-branch ( <u>EBB</u> )  | Control-reg ( <u>CTRL</u> )    | <u>SMT</u>                        |

### AIX Subset (ACS) <1099i>

|                                   |                            |
|-----------------------------------|----------------------------|
| <u>SIMD</u> -VMX-VSX              | <u>SF</u> =1 (64-bit mode) |
| Nested radix xlate ( <u>ROR</u> ) | <u>LE</u> -mode            |
| <u>OV</u> modifying ops           | <u>LPAR</u>                |

### Linux Subset (LCS) <962i>

Scalar-Binary-FP

### Scalar Fixed-Point & Floating- Point Subset (SFFS) <214i>

Base Architecture

### Scalar Fixed-Point Subset (SFS) <129i>

**Architecture Sandbox**  
For customization with  
limited applicability

|                     |              |              |                  |
|---------------------|--------------|--------------|------------------|
| Opcode-22           | [H]FSCR(8:9) | FPSCR(14:15) | Interrupt 0x0FE0 |
| SPR-704:719,720:735 | XER(54:55)   | VSCR 96,112  |                  |

|                                     |                                 |                                   |
|-------------------------------------|---------------------------------|-----------------------------------|
| Copy-paste-accel ( <u>CPA</u> )     | Non-coherent-mem ( <u>M=0</u> ) | Matrix-math-assist ( <u>MMA</u> ) |
| Secure-mem-facility ( <u>SMF</u> )  | Wr-tru-req-mem ( <u>W=1</u> )   |                                   |
| Data-stream-prefetch ( <u>STM</u> ) | Power-management ( <u>PM</u> )  |                                   |

**Always Optional Features <320i>**

|                                    |                                |                                   |
|------------------------------------|--------------------------------|-----------------------------------|
| <u>AIIL-HAIL-programmability</u>   | <u>EVIRT-programmability</u>   | Quad-prec-float ( <u>QFP</u> )    |
| Atomic-mem-ops ( <u>AMO</u> )      | SLB / <u>HPT</u> xlate         | Decimal-float ( <u>DFP</u> )      |
| Big-endian ( <u>BE</u> )           | Proc-compat-reg ( <u>PCR</u> ) | Load-store-multiple ( <u>LM</u> ) |
| Branch-history-buf ( <u>BHRB</u> ) | Broadcast- <u>TLBIE</u>        | Load-store-string ( <u>LS</u> )   |
| Event-based-branch ( <u>EBB</u> )  | Control-reg ( <u>CTRL</u> )    | <u>SMT</u>                        |

**AIX Subset (ACS) <1099i>**

|                                   |                           |
|-----------------------------------|---------------------------|
| <u>SIMD</u> -VMX-VSX              | <u>SF=1</u> (64-bit mode) |
| Nested radix xlate ( <u>ROR</u> ) | <u>LE</u> -mode           |
| <u>OV</u> modifying ops           | <u>LPAR</u>               |

**Linux Subset (LCS) <962i>**

|                          |                                                                                   |
|--------------------------|-----------------------------------------------------------------------------------|
| Scalar-Binary- <u>FP</u> | <b>Scalar Fixed-Point &amp; Floating- Point Subset (<u>SFFS</u>) &lt;214i&gt;</b> |
| Base Architecture        | <b>Scalar Fixed-Point Subset (<u>SFS</u>) &lt;129i&gt;</b>                        |

- The **compliance subset** permits “custom extensions”
  - The architecture sandbox features are intended to allow implementation dependent customization
- **For facilities with broad applicability:**
  - Developers are strongly encouraged to submit a proposal for adoption into the architecture.
  - Adopted proposals will become optional or required features of the architecture
    - These will be assigned resources that are not in the architecture sandbox to avoid fragmentation of the architecture.
  - Facilities described in proposals that are not adopted into the architecture may be implemented as Custom Extensions using the architecture sandbox.
- **System software and toolchain support of Custom Extensions is not guaranteed.**
  - Developers are encouraged to provide a means to disable custom extensions to present an architecture that is supported by standard system software and toolchain.

The PowerISA 3.1 includes a number of new features (see specification / preface for details):

- General: Byte reverse instructions, Vector Integer Multiply/Divide/Modulo Instructions, 128-bit Binary Integer Operations, Set Boolean Extension, String Operations, Test LSB by Byte Operation, VSX Scalar Minimum/Maximum/Compare Quad-Precision
- SIMD: VSX 32-byte Storage Access Operations, SIMD Permute-Class Operations, Bit-Manipulation Operations, VSX Load/Store Rightmost Element Operations, VSX Mask Manipulation Operations, VSX PCV Generate Operations for expand/compress,
- Translation Management Extensions
- Copy/Paste Extensions
- Persistent Storage / Store Sync
- Pause / Wait-reserve
- Hypervisor Interrupt Location Control
- Reduced-Precision: Outer Product Operations  
Matrix Math Assist
- Debug: BHRB Filtering updates, Multiple DEAW, New Performance Monitor SPRs, Performance Monitor Facility Sampling Security
- Instruction Prefix Support : 8B and modifying opcodes
  - Prefixed addi Instruction and Prefixed Load/Store Instructions and Addressing
  - CMODX Extension for Prefix
  - See next slide

# New matrix math instruction set architecture (MMA facility)

|         |        |
|---------|--------|
| VSR[0]  | ACC[0] |
| ...     |        |
| VSR[3]  |        |
| VSR[4]  |        |
| ...     |        |
| VSR[7]  |        |
| VSR[8]  |        |
| ...     |        |
| VSR[11] |        |
| VSR[12] |        |
| ...     |        |
| VSR[15] |        |
| VSR[16] |        |
| ...     |        |
| VSR[19] |        |
| VSR[20] |        |
| ...     |        |
| VSR[23] |        |
| VSR[24] |        |
| ...     |        |
| VSR[27] |        |
| VSR[28] |        |
| ...     |        |
| VSR[31] | ACC[7] |

- Accumulators (ACC) are  $4 \times 4$  arrays of 32-bit elements (see below for 64-bit extension)

$$A = \begin{bmatrix} a_{00} & a_{01} & a_{02} & a_{03} \\ a_{10} & a_{11} & a_{12} & a_{13} \\ a_{20} & a_{21} & a_{22} & a_{23} \\ a_{30} & a_{31} & a_{32} & a_{33} \end{bmatrix}$$

- Vector-scalar registers (VSR) are 128-bit wide and can hold
  - 4 single-precision (32-bit) floating-point values
  - 8 half-precision (16-bit) floating-point values
  - 16 signed/unsigned (8-bit) integer values
  - ...
- Instructions take 1 accumulator (input/output) and 2 vector-scalar registers (input)

**$\langle \text{operation} \rangle \ A, X, Y$**

# Rank- $k$ update with 32-bit and 16-bit elements

- 32-bit: Each VSR holds 1 column (row) of  $4 \times 32$ -bit elements



- 16-bit: Each VSR holds 2 columns (rows) of  $4 \times 16$ -bit elements



# Rank- $k$ update with 8-bit and 4-bit elements

- 8-bit: Each VSR holds 4 columns (rows) of  $4 \times 8$ -bit elements



- 4-bit: Each VSR holds 8 columns (rows) of  $4 \times 4$ -bit elements





- **Prefix architecture, Primary-Opcode=1**
  - RISC-friendly variable length instructions:
    - New 8B instruction space lays the foundation for future ISA expansion
    - Always 4B instruction alignment
  - Two forms: modifying ( $M=1$ ) and 8B opcode ( $M=0$ )
    - Modifying: prefix extends function of existing instructions
    - 8B opcode: provides new opcode space for expansion, multi-operand instructions, etc.
  - PC-relative addressing: reduced path-length with new Power ABI support
  - MMA lane masking : mask by lane for MMA operations
  - Additional instructions and capabilities
- **Generous room for expanded capabilities including opcode space and additional modifiers**

# PowerISA 3.1 Prefix architecture

**Table 11: Primary Opcode Map for Opcode Space 1 (64-bit instruction encoding) (suffix bits 0-5)**

Primary opcodes of suffixes of 8[M](LS|RR)-form prefixed instructions are mapped to opcode space 1

|     | 000                          | 001                          | 010                          | 011                    | 100          | 101                | 110                         | 111                          |     |
|-----|------------------------------|------------------------------|------------------------------|------------------------|--------------|--------------------|-----------------------------|------------------------------|-----|
| 000 | 1-00<br>[reserved]           | 1-01<br>[reserved]           | 1-02                         | 1-03                   | 1-04         | 1-05<br>[reserved] | 1-06                        | 1-07                         | 000 |
| 001 | 1-08<br>[reserved]           |                              | 1-10                         | 1-11                   | 1-12         | 1-13               | 1-14                        | 1-15                         | 001 |
| 011 | 1-16<br>[reserved]           | 1-17                         | 1-18                         | 1-19                   | 1-20         | 1-21               | 1-22<br>[reserved]          | 1-23                         | 011 |
| 010 | 1-24                         | 1-25                         | 1-26                         | 1-27                   | 1-28         | 1-29               | 1-30                        | 1-31                         | 010 |
| 110 | 1-32<br>[extended]<br>EXT132 | 1-33<br>[extended]<br>EXT133 | 1-34<br>[extended]<br>EXT134 | 1-35                   | 1-36         | 1-37               | 1-38                        | 1-39                         | 110 |
| 111 | 1-40<br>v3.1                 | 1-41<br>plwa<br>BLSD v3.1    | 1-42<br>plxsd<br>BLSD v3.1   | 1-43<br>plxssp<br>BLSD | 1-44<br>v3.1 | 1-45<br>v3.1       | 1-46<br>pstxsd<br>BLSD v3.1 | 1-47<br>pstxssp<br>BLSD v3.1 | 111 |
| 101 | 1-48<br>v3.1                 | 1-49<br>plkv<br>BLSD v3.1    | 1-50<br>plkv<br>BLSD v3.1    | 1-51<br>plkv<br>BLSD   | 1-52<br>v3.1 | 1-53<br>v3.1       | 1-54<br>pstxv<br>BLSD v3.1  | 1-55<br>pstxv<br>BLSD v3.1   | 101 |
| 100 | 1-56<br>v3.1                 | 1-57<br>plq<br>BLSD v3.1     | 1-58<br>pld<br>BLSD v3.1     | 1-59<br>plxvp<br>BLSD  | 1-60<br>v3.1 | 1-61<br>v3.1       | 1-62<br>pstq<br>BLSD v3.1   | 1-63<br>pstqd<br>BLSD v3.1   | 100 |
|     | 000                          | 001                          | 010                          | 011                    | 100          | 101                | 110                         | 111                          |     |

**Table 12: PREFIX: Opcode Map (64-bit instruction encoding) (prefix bits 6:11)**

|     | 000                         | 001                           | 010 | 011 | 100 | 101 | 110 | 111 |     |
|-----|-----------------------------|-------------------------------|-----|-----|-----|-----|-----|-----|-----|
| 000 | 00 0...<br>8LS-form<br>v3.1 |                               |     |     |     |     |     |     | 000 |
| 001 |                             |                               |     |     |     |     |     |     | 001 |
| 010 | 01 0000<br>8RR-form<br>v3.1 |                               |     |     |     |     |     |     | 010 |
| 011 |                             |                               |     |     |     |     |     |     | 011 |
| 100 | 10 0...<br>MLS-form<br>v3.1 |                               |     |     |     |     |     |     | 100 |
| 101 |                             |                               |     |     |     |     |     |     | 101 |
| 110 | 11 0000<br>MRR-form<br>v3.1 |                               |     |     |     |     |     |     | 110 |
| 111 |                             | 11 1001<br>MMIRR-form<br>v3.1 |     |     |     |     |     |     | 111 |
|     | 000                         | 001                           | 010 | 011 | 100 | 101 | 110 | 111 |     |



**Generous room for expanded capabilities including opcode space and additional modifiers**

# Summary

- OpenPOWER is an open and license-free for compliant architectures
- Maintained by the OpenPOWER foundation working group
- Compliancy subsets allow for a very wide range of implementations
  - from very small cores that can fit on a small FPGA
  - to implementations that support all instructions and can run multiple enterprise OS's

# Resources

---

## The Evolution of RISC technology at IBM:

- <https://acg.cis.upenn.edu/milom/cis501-Fall11/papers/cocke-RISC.pdf>

## ISA:

- <https://openpowerfoundation.org/specifications/isa>

## Microwatt core:

- <https://github.com/antonblanchard/microwatt>

## Other open cores:

- <https://github.com/openpower-cores>

# Acknowledgements

---

This presentation draws from prior presentations by

Brian Thompto

José Moreira