Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[avx2] Optimize byte packing during decoding #2

Open
jethrogb opened this issue Apr 22, 2020 · 2 comments
Open

[avx2] Optimize byte packing during decoding #2

jethrogb opened this issue Apr 22, 2020 · 2 comments

Comments

@jethrogb
Copy link
Member

During decoding, decode_avx2 returns an 32-byte value along with a 32-bit mask indicating which bytes are valid (i.e. not decoded from whitespace). Currently, these are packed using a simple loop over the bytes. There are likely more efficient ways to do this. (On AVX-512, you'd use the VPCOMPRESSB instruction, but that's not available here)

@TheIronBorn
Copy link

You can emulate VPCOMPRESS with PSHUFB and a lookup table of shuffle control masks indexed by the bitmask.

Here's a library for that purpose https://github.com/lemire/simdprune/. It includes methods for various memory needs (a 16-bit mask means a 1 MiB table if unoptimized)

@jethrogb
Copy link
Member Author

jethrogb commented Oct 4, 2021

That looks like an good direction to explore. Note that PSHUFB works on 16 bytes so you'd still need to re-pack the output of 2 PSHUFB calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants