Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New f1600x4 #14

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

New f1600x4 #14

wants to merge 1 commit into from

Conversation

bwesterb
Copy link
Contributor

Instead of using intrinsics and full unrolling, this uses a
four-round unrolled version adapted from the one I wrote for
Cloudflare's CIRCL library:

github.com/cloudflare/circl/simd/keccakf1600

This is about 10-20% faster on my notebook. We'd better check whether that's also true for other systems as well before merging.

@thomwiggers
Copy link
Contributor

This also means it no longer works on Windows.

shake256-avx2/f1600x4.s Outdated Show resolved Hide resolved
#endif /* !__APPLE */
movq $6, %rax
17: # loop.begin:
vmovdqa 0(%rdi), %ymm8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't compile with MSVC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a windows machine available to me. What is the error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSVC has completely different syntax; there is not a chance to write assembly code that works for both Windows and gnu-style assemblers without going to a portable assembler like NASM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

peachpy (which I used to generate the assembler, see f1600x4.py) does support nasm as output.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you not generate microsoft assembly as .asm files? Then those could be used on Windows (with a nmake Makefile (see PQClean) or you could consider contributing an appropriate CMakeList.txt).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately peachpy doesn't support microsoft asm. It can directly generate the windows object file. It can also generate nasm. What would we prefer? @joostrijneveld

Instead of using intrinsics and full unrolling, this uses a
four-round unrolled version adapted from the one I wrote for
Cloudflare's CIRCL library:

    github.com/cloudflare/circl/simd/keccakf1600
keccak4x/KeccakP-1600-times4-SIMD256.o
*.s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the file having .S (capital letter) now this probably also needs to be updated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wouldn't want to ignore the file itself, right? The .s is generated form the .S when compiling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah you're not running peachpy in Make of course.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Try to keep dependencies to compile to a minimum.

@bwesterb
Copy link
Contributor Author

bwesterb commented Aug 2, 2022

@thomwiggers Any suggestions how to advance this? (Given you are facing the same troubles with Kyber & Dilithium.)

@thomwiggers
Copy link
Contributor

The Kyber/Dilithium AVX2 implementations are simply marked as not supported on Windows. It would still be a shame to drop Windows support from this otherwise mostly (bar VLAs) portable plain-C implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants