Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap for adding PowerPC support to Vale #454

Open
mamonet opened this issue Jun 15, 2021 · 16 comments
Open

Roadmap for adding PowerPC support to Vale #454

mamonet opened this issue Jun 15, 2021 · 16 comments

Comments

@mamonet
Copy link
Member

mamonet commented Jun 15, 2021

Adding support of PowerPC to Vale give the chance to optimize algorithms that can't be improved by vectorization and have hardware-acceleration support, it also gives the flexibility to implement those algorithms using state-of-art researches like implementing the upcoming white paper "Optimize AES-GCM for PowerPC architecture processors" in HACL* which offers 25 times better performance than lookup-table-based C implementation.
It also has the opportunity to get higher speed for the vectorized algorithms by having more control over the execution flow of the assembly implementation.
However, adding support of any architecture to Vale implies a considerable amount of work, there are a lot of Low* and verified assembly modules that have to be implemented before jumping into algorithm implementations.
I'll list an initial roadmap of adding PowerPC support to Vale by following the scheme used for x86_64 architecture, I'll update the roadmap parodically to reflect fresh info or progression.

Vale PowerPC support upstream https://github.com/mamonet/vale

(Implemented modules)

Module fstar/specs/hardware/PPC64LE.Machine_s

  • Define architectural components such as general and vector registers, conditional flags, and basic types and helper functions for type-checking.

Module fstar/specs/hardware/PPC64LE.Print_s

  • Printer module that support GCC syntax of assembly code

Module fstar/specs/hardware/PPC64LE.Semantics_s

  • Substantial functions for compile-time value-checking of the architectural elements to make sure we get the expected results at run-time

Module fstar/code/arch/ppc64le/PPC64LE.Vale.Decls

  • Interop and lemma functions used by the extracted vaf files to bind the verified assembly functions with the context of architectural element

Module fstar/code/arch/ppc64le/PPC64LE.Vale.State

  • Vale.Decls can't refer to to Semantics_s so this module define requisite functions by Vale.Decls

Module fstar/code/arch/ppc64le/PPC64LE.Vale.Lemmas

  • Lemma functions used by the interpreted vaf files to verify the conditional statements with context of conditional flags

Module fstar/code/arch/ppc64le/PPC64LE.Vale.StateLemmas

  • Evaluation lemmas

Module fstar/code/arch/ppc64le/PPC64LE.Vale.Regs

  • Register equality lemmas

Module fstar/code/arch/ppc64le/PPC64LE.Vale.Vecs

  • Vector register lemmas

Module fstar/code/arch/ppc64le/PPC64LE.Vale.InsMem.vaf

  • Defined the statements of memory operations that supported in verified assembly.

Module fstar/code/arch/ppc64le/PPC64LE.Vale.InsBasic.vaf

  • Basic instructions and wrappers operated on general-purpose registers:
    Move operand
    Conditional move operand
    Add two operands
    Add two loaded effective addresses
    Add two operands with carry
    Add two operands with overflow
    Subtract two operands
    Subtract two operands with borrow
    Multiply two unsigned operands
    Multiply two signed operands
    Exclusive-or (Xor) two operands
    And two operands
    Shift left operand
    Shift right operand

Module fstar/code/arch/ppc64le/PPC64LE.Vale.InsVec.vaf

  • Vector instructions operated on vector registers:
    Vector add two operands
    Vector exclusive-or (Xor) two operands
    Vector And two operands
    Vector shift left algebraic
    Vector shift right algebraic
    Vector shift right algebraic quadword
    Vector shift left double by octet immediate (Concatenate two vector registers)
    Vector permute
    Vector compare equal unsigned doubleword

Module fstar/code/test/PPC64LE.Test.Memcpy.vaf

  • Test function of memory access operations on POWER.

Module fstar/code/test/PPC64LE.Test.Basics.vaf

  • Test functions of specific basic features on POWER.

fstar/code/test/PowerTest.c and fstar/code/test/PowerTest.ml

  • Testing the generated assembly of testing modules.

(Upcoming update)

module Vale.PPC64LE.CPU_Features_s

  • Starting with PowerISA version 2.06, VSX support used for vector loading/storing operations of word data type.
    Accelerate Chacha20, Poly1305
  • PowerISA version 2.07, crypto support for AES cipher/inverse cipher, Polynomial multiplication, SHA-256 and SHA-512 sigma instructions.
    Accelerate AES, GHASH, SHA-256, and SHA-512.
  • PowerISA version 3.00, vector loading/storing operations of byte data type

module Vale.PPC64LE.CryptoInstructions_s

  • Wrapper for built-in crypto instructions.

module Vale.PPC64LE.InsAes

  • Vector AES cipher instructions:
    Vector AES Cipher
    Vector AES Cipher Last
    Vector AES Inverse Cipher
    Vector AES Inverse Cipher Last
    Vector AES SubBytes

module Vale.PPC64LE.InsGcm

  • Vector GHASH instruction:
    Vector Polynomial Multiply-Sum Doubleword

module Vale.PPC64LE.InsSha

  • Vector SHA-256 and SHA-512 Sigma instructions:
    Vector SHA-256 Sigma Word
    Vector SHA-512 Sigma Doubleword

Code.Crypto.Aes.PPC64LE

  • Implement AES-GCM functions:
  • AES key expansion (Key schedule)
  • ICB initialization
  • GHASH subkey initialization
  • Authenticated encrypt
  • Authenticated decrypt
  • Additional authenticated data
  • digest calculation

Code.Crypto.ECC.PPC64LE

  • Implement ECC authentication functions.

Code.Crypto.Sha.PPC64LE

  • Implement SHA-256 and SHA-512 functions:
  • Update state
  • Digest calculation

(Optional features)

Module Vale.PPC64LE.Instructions_s

  • Define requisite instructions of PowerISA and basic formatting syntaxes to be printed in assembly files for parsing

module Vale.PPC64LE.Leakage_s

  • Define functions that compare registers, flags, memory, and stack values to check if leakage has occurred.

module Vale.PPC64LE.Leakage_Ins

  • Check if certain instructions are leakage-free
  • Measure the time consumed by certain instructions and determine if the consumed time is fixed

module Vale.PPC64LE.Leakage

  • Check if a block of code is leakage-free
  • Measure the time consumed by a block of code and determine if the consumed time is fixed

module Vale.PPC64LE.Leakage_Helpers

  • Helper functions for Vale.PPC64LE.Leakage_Ins and Vale.PPC64LE.Leakage modules

module Vale.PPC64LE.QuickCodes

  • Quick code functions (Block, Condition, Loop, Assert..)

Code.Arch.PPC64LE.Interop modules

  • Define specific types of requisite operations in Low*
  • Define wrapping layer and convention functions for that operations

Developer Qualifications (B.E.E.):

  • General-purpose and low-level programming languages (Assembly, C/C++..)
  • F* and Low* programming language (I've gone through the official tutorials https://www.fstar-lang.org/tutorial/ https://fstarlang.github.io/lowstar/html/LowStar.html)
  • 64-Bit ELFv2 ABI Specification: Power Architecture
  • Power ISA (instruction set architecture) Versions: 2.07, 3.0, 3.1
  • Specifications of cryptography algorithms (AES, GCM, Chacha20, Poly1305..)
  • Writing state-of-art researches to optimize AES-GCM on PowerPC architecture.

As usual adding support of PowerPC architecture to Vale will be sponsored by IBM and every update pack will be reviewed by IBM crypto team.

@Mr-Draco
Copy link

@mamonet, I have also been researching and developing in this area of the repository. The goal is to complete the following bounty: https://www.bountysource.com/issues/88058247-freebl-power-poly1305-mac-vector-acceleration. I was hoping to bring more clarity to the repo and think it might be beneficial to collaborate. Would you be interested in working together on the project?

@mamonet
Copy link
Member Author

mamonet commented Jul 30, 2021

@MrDracoG, this issue is more specialized than the issue you have mentioned, working on this issue requires specific qualifications and experience. Let me explain what kind of work is required to accomplish this issue.
Vale takes its name from verified assembly language for Everest, as the name reveals the language uses in its core the regular assembly language running inside a full-fledged architecture sandbox to verify error-prone procedures. In order to build that sandbox we need to build an interop system to process the implemented cryptographic algorithms and monitor multiple underlying components like stack, memory, registers.. to check for undesired cases like overflow or memory corruption.
Those functions have to follow the framework used in Everest respecting writing proper proofs to make sure everything run safely so we need to use Low* language in particular. Low* is subset of F* programming language which is heavily inherited from OCaml language.
With that said, and in order to corporate in working on this issue you need to a background or familiarity with:

  • OCaml-like syntax
  • F* and low* programing languages, I read every page of the official tutorial even so I need to read more documentations and codes to grasp the whole concepts of this interesting language
  • General-purpose and low-level languages (C/C++, Assembly..)
  • PPC64LE ABI Specification and Power ISA
  • Cryptography algorithm specifications

It's very difficult to get to the point where you can participate in this support and it's not at all like working on the issue you mentioned. I remember when I optimized AES-GCM algorithm on PowerPC architecture for freebl, it wasn't an easy task, I implemented a research that we wrote based on a novel algorithm using Assembly language to maximize the performance on that platform even though this can't be compared to this work as more specific techniques and approaches are involved here.

@msprotz
Copy link
Contributor

msprotz commented Aug 2, 2021

Coming back from vacations... and just seeing this new roadmap. Thanks @mamonet. This seems overall reasonable but maybe @Chris-Hawblitzel has some thoughts on it.

@mamonet
Copy link
Member Author

mamonet commented Aug 16, 2021

@msprotz @Chris-Hawblitzel To organize things more, I suggest creating a new branch in this repository to receive the pull requests once verified by IBM crypto team to submit them to HACL* team. Once the complete PowerPC support is in place and has the full functionalities, HACL* team can merge that particular branch to the upstream afterwards.

@msprotz
Copy link
Contributor

msprotz commented Aug 16, 2021

This seems good to me -- a branch seems like a good way to go. I'd still like to hear whether @Chris-Hawblitzel will have enough time to follow this work closely (or maybe @parno ?) to make sure you don't get stuck needlessly.

@Chris-Hawblitzel
Copy link
Contributor

My advice is to start as simple as possible. I just added a directory https://github.com/project-everest/vale/tree/master/fstar/code/arch/x64simple to show simple definitions of x64 code. It excludes the following features that make the HACL* x64 definitions so complicated:

  • high-level generic instruction definitions ( https://github.com/project-everest/hacl-star/blob/master/vale/specs/hardware/Vale.X64.Instructions_s.fsti ) -- these are elegant and concise, but are not necessary to get started
  • quickCode -- this can help speed up verification by exploiting F* features to generate more efficient SMT queries, but it's also kind of bleeding-edge
  • leakage analysis (taint tracking) -- this is optional, and can be added at a later stage
  • formally verified C/assembly interop -- this is optional, and is a big mountain to climb

You can also look at the ARM definitions in https://github.com/project-everest/vale/tree/master/dafny/code/arch/arm -- these are for Dafny, but an F* version wouldn't be that different.

@mamonet
Copy link
Member Author

mamonet commented Aug 18, 2021

Thanks, that makes adding support of another architecture much simpler for starters. I noticed that x64simple's modules utilize the the modules in directory https://github.com/project-everest/vale/tree/master/fstar/specs/hardware so I think I need to start with both directories to get the basic functionality of architecture support!

@Chris-Hawblitzel
Copy link
Contributor

Yes, that's right. Both directories are necessary.

@Mr-Draco
Copy link

  • @mamonet, How do you plan to allow Vale developers to specify suffixes in instructions?
    • By example, a multiply low double word instruction can be written in assembly as mulld, mulld., mulldo, mulldo. to reflect operations that may or may not result in bit changes in the condition register (CR) and/or exception register (XER).
    • Using your definition of MulHigh32 (ref0) as a basis, would it be something like MulLow64_s which takes in register operands and boolean operands, so that a developer could specify an instruction in Vale like MulLow64_s(r0,r1,r2,oe0,rc1)?

ref0: mamonet@ced6472#diff-c94ea0a116213844e7b57cf040c97564ae2d19f7f07e8e5cb3daf2fe7ac2ca82R31

@mamonet
Copy link
Member Author

mamonet commented Oct 17, 2021

My advice is to start as simple as possible. I just added a directory https://github.com/project-everest/vale/tree/master/fstar/code/arch/x64simple to show simple definitions of x64 code.

I've added the basic PowerPC support to Vale by following the definitions in x64simple implementation mamonet/vale@66c6407
The commit description summarizes the features and supported functions of this patch.
All modules are verified successfully.

I also added this commit in a different branch to verify the new testing modules of PowerPC arch mamonet/vale@962f11b
The output of running the testing modules are logged and set in the commit description.

It excludes the following features that make the HACL* x64 definitions so complicated:

  • high-level generic instruction definitions ( https://github.com/project-everest/hacl-star/blob/master/vale/specs/hardware/Vale.X64.Instructions_s.fsti ) -- these are elegant and concise, but are not necessary to get started
  • quickCode -- this can help speed up verification by exploiting F* features to generate more efficient SMT queries, but it's also kind of bleeding-edge
  • leakage analysis (taint tracking) -- this is optional, and can be added at a later stage
  • formally verified C/assembly interop -- this is optional, and is a big mountain to climb

Do you recommend adding those optional features to PowerPC support or we can jump right away and implement cryptographic algorithms using that basic support?

@Chris-Hawblitzel
Copy link
Contributor

I would recommend starting on the cryptographic algorithms using the basic support.

@mamonet
Copy link
Member Author

mamonet commented May 25, 2022

In order to proceed with the roadmap, I've pushed a PR project-everest/vale#51 into Vale repository that adds basic PowerPC support. The patch covers more testing cases than the initial one that integrated now to SCons.
I apologize for the delay of proposing the patch for merging but things took some time by IBM cryptography team of Power to ensure the added modules meet the requirements of Vale repository.

@mamonet
Copy link
Member Author

mamonet commented Aug 4, 2022

I've added SHA implementation with HACL integration of vale support into my fork of repository mamonet@46e7e73

Hacl integration includes the following features.

  • Basic vale support of PowerPC architecture.
  • Memory layout and heapleat support.
  • Stack functionalities (alloc, dealloc, store, load..)
  • QuickCode support.
  • Add more basic and vector instructions that utilized in SHA implementation.
  • Expand lemmas of which have use cases.

SHA implementation in module Vale.SHA.PPC64LE.vaf follows the scheme of x64 SHA module by projecting OpenSSL implementation in high-assurance manner and utilizing common proofs where possible. The module still lacks the high-level proof of processing multiple blocks that supposed to be compatible with update_multi_quads which is WIP but everything else is well established and entirely proved. In the meantime, I would like to get feedback from maintainers about the current status of this implementation.
@Chris-Hawblitzel the verification process has a stable outcome of successful validation but it takes relatively long time, is there any performance monitoring tools available and what's your tips to have a fast to verify piece of module?

My practical observation is that using variable vector operands for example in Loop_rounds_0_63_body and calling it multiple times would slow down that process but since the operands have to be shifted each round I can't think of any alternative that eliminate the cpu hit.

Performance test
The patch has source file TestShaPPC64LE.cpp to test the performance speed. I've benchmarked the output of implemented module on POWER9 processor, for 64 blocks it took 91 μs (12.2 cycles/byte) while I can get 86 μs sec (11.5 cycles/byte) for OpenSSL hardware-accelerated implementation. I can refer the performance gap to the difference of constant table layout of which OpenSSL repeats each word four times consecutively which is incompatible with table specification in Hacl.

@msprotz
Copy link
Contributor

msprotz commented Dec 15, 2022

@edelsohn, @mamonet is this something that you consider complete following PR#609?

@mamonet
Copy link
Member Author

mamonet commented Dec 15, 2022

There's still patches to follow to complete the roadmap. AES-GCM implementation is set to be released soon, there's a delay due to the GCM algorithm change, intel white paper is being implemented now.

@mamonet
Copy link
Member Author

mamonet commented Feb 16, 2023

AES-GCM implementation has been added here #777 I aplologize for the dealys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants