Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

salsa20: performance optimizations (e.g. SIMD) #50

Open
4 of 7 tasks
tarcieri opened this issue Aug 19, 2019 · 1 comment
Open
4 of 7 tasks

salsa20: performance optimizations (e.g. SIMD) #50

tarcieri opened this issue Aug 19, 2019 · 1 comment

Comments

@tarcieri
Copy link
Member

tarcieri commented Aug 19, 2019

There are two big optimizations we could do on both the chacha20 and salsa20 crates.

Avoid recomputing initial state

EDIT: both crates now have a new method to compute the initial state, and separate apply_keystream / generate methods to compute a block

  • chacha20 crate
  • salsa20 crate

RFC 8439 Section 3 describes caching the initial block state once computed as a performance optimization:

   Each block of ChaCha20 involves 16 move operations and one increment
   operation for loading the state, 80 each of XOR, addition and roll
   operations for the rounds, 16 more add operations and 16 XOR
   operations for protecting the plaintext.  Section 2.3 describes the
   ChaCha block function as "adding the original input words".  This
   implies that before starting the rounds on the ChaCha state, we copy
   it aside, only to add it in later.  This is correct, but we can save
   a few operations if we instead copy the state and do the work on the
   copy.  This way, for the next block you don't need to recreate the
   state, but only to increment the block counter.  This saves
   approximately 5.5% of the cycles.

SIMD support

Both ChaCha20 and Salsa20 are amenable to SIMD optimizations. We should add SIMD optimizations on x86/x86_64 at the very least.

x86/x86_64

Other CPU architectures

  • ARM?
@tarcieri tarcieri changed the title chacha20/salsa20: performance optimizations (e.g. SIMD) salsa20: performance optimizations (e.g. SIMD) Jan 17, 2020
@tarcieri
Copy link
Member Author

Changed topic to salsa20 as chacha20 is now optimized on x86.

chacha20 could still use e.g. NEON acceleration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant