Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure XXH_INLINE_ALL fully inlines all XXH32_ and XXH64_ symbols #903

Draft
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

Cyan4973
Copy link
Owner

@Cyan4973 Cyan4973 commented Dec 8, 2023

fix #902 with XXH64_finalize(), detected by @simonhf

adds a corresponding test

even though `XXH_INLINE_ALL` is set
on `gcc-11`.
@Cyan4973 Cyan4973 self-assigned this Dec 8, 2023
@simonhf
Copy link

simonhf commented Dec 9, 2023

@Cyan4973, some comments and suggestions:

  1. There seem to be other functions which are not necessarily inlined too, e.g. XXH64_avalanche()?
  2. Maybe it would be worthwhile having an extra "opposite" nm test where non-inlining is expected for the test to pass?
  3. It might be worthwhile drawing attention to this issue e.g. on the main README, since in theory the tests are still not guaranteed to catch the issue. Unless we can come up with a guaranteed way to catch this, users of xxHash should ideally always code their own inline tests into their own projects?
  4. I have noticed big differences between gcc and clang regarding inlining. gcc will only do so many inlines before switching off inline, e.g. in tests maybe up to 40 inlined hash functions in a function. Whereas, clang seems to to any number without complaining or not inlining. gcc does have a bunch of parameters to control this behaviour but in reality I couldn't get them to help.
  5. Also, it might be worth doing some extra research into workarounds, i.e. there could be non-intuitive sneaky ways to detect non-inlining, e.g. [1]? Note: I haven't tried this out!

[1] https://stackoverflow.com/questions/24589605/gcc-not-inlined-warning

@simonhf
Copy link

simonhf commented Dec 9, 2023

@Cyan4973 I tried a lot of Googling but could not find a good solution via gcc attributes, or pragma, or other non-intuitive / sneaky ways :-(

I also played around a lot with -Winline which in theory does nearly what we want. But in reality I couldn't get it to work. And it also suffers from the issues that if the developer forgets to ask for any inlining, then -Winline would not help anyway.

So I came up with this Perl one-liner which collapses each function definition onto a single line which is easier for grepping:

$ perl -e '$_ = `cat xxHash/xxhash.h`; s~\\\s*[\r\n]\s*~~gs; s~//.*~~gm; s~/\*.*?\*/~~gs; s~,\s*[\r\n]\s*~, ~gs; s~^#.*$~~gm; s~([0-9a-z])\s*[\r\n]\s*~$1 ~gis; print;' | egrep -v "#\s*define" | egrep "XXH(3|32|64)_[^\s]+\s*\(.*\)\s*$"
static xxh_u32 XXH32_round(xxh_u32 acc, xxh_u32 input)
static xxh_u32 XXH32_avalanche(xxh_u32 hash)
static XXH_PUREF xxh_u32 XXH32_finalize(xxh_u32 hash, const xxh_u8* ptr, size_t len, XXH_alignment align)
XXH_FORCE_INLINE XXH_PUREF xxh_u32 XXH32_endian_align(const xxh_u8* input, size_t len, xxh_u32 seed, XXH_alignment align)
XXH_PUBLIC_API XXH32_state_t* XXH32_createState(void)
XXH_PUBLIC_API XXH_errorcode XXH32_freeState(XXH32_state_t* statePtr)
XXH_PUBLIC_API void XXH32_copyState(XXH32_state_t* dstState, const XXH32_state_t* srcState)
XXH_PUBLIC_API XXH_errorcode XXH32_update(XXH32_state_t* state, const void* input, size_t len)
static xxh_u64 XXH64_round(xxh_u64 acc, xxh_u64 input)
static xxh_u64 XXH64_mergeRound(xxh_u64 acc, xxh_u64 val)
static xxh_u64 XXH64_avalanche(xxh_u64 hash)
static XXH_PUREF xxh_u64 XXH64_finalize(xxh_u64 hash, const xxh_u8* ptr, size_t len, XXH_alignment align)
XXH_FORCE_INLINE XXH_PUREF xxh_u64 XXH64_endian_align(const xxh_u8* input, size_t len, xxh_u64 seed, XXH_alignment align)
XXH_PUBLIC_API XXH64_state_t* XXH64_createState(void)
XXH_PUBLIC_API XXH_errorcode XXH64_freeState(XXH64_state_t* statePtr)
XXH_PUBLIC_API void XXH64_copyState(XXH_NOESCAPE XXH64_state_t* dstState, const XXH64_state_t* srcState)
XXH_PUBLIC_API XXH_errorcode XXH64_update (XXH_NOESCAPE XXH64_state_t* state, XXH_NOESCAPE const void* input, size_t len)
static xxh_u64 XXH3_mul128_fold64(xxh_u64 lhs, xxh_u64 rhs)
static XXH64_hash_t XXH3_avalanche(xxh_u64 h64)
static XXH64_hash_t XXH3_rrmxmx(xxh_u64 h64, xxh_u64 len)
XXH_FORCE_INLINE XXH_PUREF XXH64_hash_t XXH3_len_1to3_64b(const xxh_u8* input, size_t len, const xxh_u8* secret, XXH64_hash_t seed)
XXH_FORCE_INLINE XXH_PUREF XXH64_hash_t XXH3_len_4to8_64b(const xxh_u8* input, size_t len, const xxh_u8* secret, XXH64_hash_t seed)
XXH_FORCE_INLINE XXH_PUREF XXH64_hash_t XXH3_len_9to16_64b(const xxh_u8* input, size_t len, const xxh_u8* secret, XXH64_hash_t seed)
XXH_FORCE_INLINE XXH_PUREF XXH64_hash_t XXH3_len_0to16_64b(const xxh_u8* input, size_t len, const xxh_u8* secret, XXH64_hash_t seed)
XXH_FORCE_INLINE xxh_u64 XXH3_mix16B(const xxh_u8* XXH_RESTRICT input, const xxh_u8* XXH_RESTRICT secret, xxh_u64 seed64)
XXH_FORCE_INLINE XXH_PUREF XXH64_hash_t XXH3_len_17to128_64b(const xxh_u8* XXH_RESTRICT input, size_t len, const xxh_u8* XXH_RESTRICT secret, size_t secretSize, XXH64_hash_t seed)
XXH_NO_INLINE XXH_PUREF XXH64_hash_t XXH3_len_129to240_64b(const xxh_u8* XXH_RESTRICT input, size_t len, const xxh_u8* XXH_RESTRICT secret, size_t secretSize, XXH64_hash_t seed)
XXH_FORCE_INLINE XXH_TARGET_AVX512 void XXH3_accumulate_512_avx512(void* XXH_RESTRICT acc, const void* XXH_RESTRICT input, const void* XXH_RESTRICT secret)
XXH_FORCE_INLINE XXH_TARGET_AVX512 XXH3_ACCUMULATE_TEMPLATE(avx512)
XXH_FORCE_INLINE XXH_TARGET_AVX2 void XXH3_accumulate_512_avx2( void* XXH_RESTRICT acc, const void* XXH_RESTRICT input, const void* XXH_RESTRICT secret)
XXH_FORCE_INLINE XXH_TARGET_AVX2 XXH3_ACCUMULATE_TEMPLATE(avx2)
XXH_FORCE_INLINE XXH_TARGET_SSE2 XXH3_ACCUMULATE_TEMPLATE(sse2)
XXH_FORCE_INLINE void XXH3_accumulate_512_neon( void* XXH_RESTRICT acc, const void* XXH_RESTRICT input, const void* XXH_RESTRICT secret)
XXH_FORCE_INLINE XXH3_ACCUMULATE_TEMPLATE(neon)
XXH_FORCE_INLINE XXH3_ACCUMULATE_TEMPLATE(vsx)
XXH_FORCE_INLINE XXH3_ACCUMULATE_TEMPLATE(scalar)
XXH_PUBLIC_API XXH3_state_t* XXH3_createState(void)
XXH_PUBLIC_API XXH_errorcode XXH3_freeState(XXH3_state_t* statePtr)
XXH_PUBLIC_API void XXH3_copyState(XXH_NOESCAPE XXH3_state_t* dst_state, XXH_NOESCAPE const XXH3_state_t* src_state)
XXH_FORCE_INLINE XXH_errorcode XXH3_update(XXH3_state_t* XXH_RESTRICT const state, const xxh_u8* XXH_RESTRICT input, size_t len, XXH3_f_accumulate f_acc, XXH3_f_scrambleAcc f_scramble)
XXH_FORCE_INLINE XXH_PUREF XXH128_hash_t XXH3_len_1to3_128b(const xxh_u8* input, size_t len, const xxh_u8* secret, XXH64_hash_t seed)
XXH_FORCE_INLINE XXH_PUREF XXH128_hash_t XXH3_len_4to8_128b(const xxh_u8* input, size_t len, const xxh_u8* secret, XXH64_hash_t seed)
XXH_FORCE_INLINE XXH_PUREF XXH128_hash_t XXH3_len_9to16_128b(const xxh_u8* input, size_t len, const xxh_u8* secret, XXH64_hash_t seed)
XXH_FORCE_INLINE XXH_PUREF XXH128_hash_t XXH3_len_0to16_128b(const xxh_u8* input, size_t len, const xxh_u8* secret, XXH64_hash_t seed)
XXH_FORCE_INLINE XXH_PUREF XXH128_hash_t XXH3_len_17to128_128b(const xxh_u8* XXH_RESTRICT input, size_t len, const xxh_u8* XXH_RESTRICT secret, size_t secretSize, XXH64_hash_t seed)
XXH_NO_INLINE XXH_PUREF XXH128_hash_t XXH3_len_129to240_128b(const xxh_u8* XXH_RESTRICT input, size_t len, const xxh_u8* XXH_RESTRICT secret, size_t secretSize, XXH64_hash_t seed)
XXH_FORCE_INLINE void XXH3_combine16(void* dst, XXH128_hash_t h128)
XXH_PUBLIC_API XXH_errorcode XXH3_generateSecret(XXH_NOESCAPE void* secretBuffer, size_t secretSize, XXH_NOESCAPE const void* customSeed, size_t customSeedSize)
XXH_PUBLIC_API void XXH3_generateSecret_fromSeed(XXH_NOESCAPE void* secretBuffer, XXH64_hash_t seed)

We could consider enforcing via tests that all functions MUST have either XXH_FORCE_INLINE or XXH_NO_INLINE in their definition? And then the following line can find any functions which do not have one or the other, e.g.:

$ perl -e '$_ = `cat xxHash/xxhash.h`; s~\\\s*[\r\n]\s*~~gs; s~//.*~~gm; s~/\*.*?\*/~~gs; s~,\s*[\r\n]\s*~, ~gs; s~^#.*$~~gm; s~([0-9a-z])\s*[\r\n]\s*~$1 ~gis; print;' | egrep -v "#\s*define" | egrep "XXH(3|32|64)_[^\s]+\s*\(.*\)\s*$" | egrep -v "XXH_FORCE_INLINE|XXH_NO_INLINE"
static xxh_u32 XXH32_round(xxh_u32 acc, xxh_u32 input)
static xxh_u32 XXH32_avalanche(xxh_u32 hash)
static XXH_PUREF xxh_u32 XXH32_finalize(xxh_u32 hash, const xxh_u8* ptr, size_t len, XXH_alignment align)
XXH_PUBLIC_API XXH32_state_t* XXH32_createState(void)
XXH_PUBLIC_API XXH_errorcode XXH32_freeState(XXH32_state_t* statePtr)
XXH_PUBLIC_API void XXH32_copyState(XXH32_state_t* dstState, const XXH32_state_t* srcState)
XXH_PUBLIC_API XXH_errorcode XXH32_update(XXH32_state_t* state, const void* input, size_t len)
static xxh_u64 XXH64_round(xxh_u64 acc, xxh_u64 input)
static xxh_u64 XXH64_mergeRound(xxh_u64 acc, xxh_u64 val)
static xxh_u64 XXH64_avalanche(xxh_u64 hash)
static XXH_PUREF xxh_u64 XXH64_finalize(xxh_u64 hash, const xxh_u8* ptr, size_t len, XXH_alignment align)
XXH_PUBLIC_API XXH64_state_t* XXH64_createState(void)
XXH_PUBLIC_API XXH_errorcode XXH64_freeState(XXH64_state_t* statePtr)
XXH_PUBLIC_API void XXH64_copyState(XXH_NOESCAPE XXH64_state_t* dstState, const XXH64_state_t* srcState)
XXH_PUBLIC_API XXH_errorcode XXH64_update (XXH_NOESCAPE XXH64_state_t* state, XXH_NOESCAPE const void* input, size_t len)
static xxh_u64 XXH3_mul128_fold64(xxh_u64 lhs, xxh_u64 rhs)
static XXH64_hash_t XXH3_avalanche(xxh_u64 h64)
static XXH64_hash_t XXH3_rrmxmx(xxh_u64 h64, xxh_u64 len)
XXH_PUBLIC_API XXH3_state_t* XXH3_createState(void)
XXH_PUBLIC_API XXH_errorcode XXH3_freeState(XXH3_state_t* statePtr)
XXH_PUBLIC_API void XXH3_copyState(XXH_NOESCAPE XXH3_state_t* dst_state, XXH_NOESCAPE const XXH3_state_t* src_state)
XXH_PUBLIC_API XXH_errorcode XXH3_generateSecret(XXH_NOESCAPE void* secretBuffer, size_t secretSize, XXH_NOESCAPE const void* customSeed, size_t customSeedSize)
XXH_PUBLIC_API void XXH3_generateSecret_fromSeed(XXH_NOESCAPE void* secretBuffer, XXH64_hash_t seed)

This technique finds both of the functions mentioned so far (please see below), plus a lot more (please see above):

$ perl -e '$_ = `cat xxHash/xxhash.h`; s~\\\s*[\r\n]\s*~~gs; s~//.*~~gm; s~/\*.*?\*/~~gs; s~,\s*[\r\n]\s*~, ~gs; s~^#.*$~~gm; s~([0-9a-z])\s*[\r\n]\s*~$1 ~gis; print;' | egrep -v "#\s*define" | egrep "XXH(3|32|64)_[^\s]+\s*\(.*\)\s*$" | egrep -v "XXH_FORCE_INLINE|XXH_NO_INLINE" | egrep "(_finalize|_avalanche)"
static xxh_u32 XXH32_avalanche(xxh_u32 hash)
static XXH_PUREF xxh_u32 XXH32_finalize(xxh_u32 hash, const xxh_u8* ptr, size_t len, XXH_alignment align)
static xxh_u64 XXH64_avalanche(xxh_u64 hash)
static XXH_PUREF xxh_u64 XXH64_finalize(xxh_u64 hash, const xxh_u8* ptr, size_t len, XXH_alignment align)
static XXH64_hash_t XXH3_avalanche(xxh_u64 h64)

Disclaimer: Yes, I know it's polarizing Perl, but the good news is that Perl is installed by default pretty much everywhere :-)

@Cyan4973
Copy link
Owner Author

Cyan4973 commented Dec 13, 2023

There seem to be other functions which are not necessarily inlined too, e.g. XXH64_avalanche()?

I don't see any XXH64_avalanche() symbol left after this PR.
Specifically, I don't see any symbol prefixed XXH32_ or XXH64_ left.

It might be worthwhile drawing attention to this issue e.g. on the main README, since in theory the tests are still not guaranteed to catch the issue. Unless we can come up with a guaranteed way to catch this, users of xxHash should ideally always code their own inline tests into their own projects?

The new test is run as part of make test, which is run in CI on a wide number of combinations of compilers and targets.
Specifically, all gcc versions from 7 to current are tested, as well as all clang versions from 6 to current.
If they all pass the test, it's pretty good hint that inlining is working correctly.

Granted, some specific compilers or platforms may not work as well, but some compilers may also not support force_inline to begin with. We are in the domain of compiler extensions, so there is no pretense that it will always work. We do the best we can, and if that includes all known versions of gcc and clang, this is a good starting point.

We could consider enforcing via tests that all functions MUST have either XXH_FORCE_INLINE or XXH_NO_INLINE in their definition?

Now that you mention, I don't think the current fix is correct.
What we want is perfect inlining when XXH_INLINE_ALL is set, but not necessarily always,
but the newly set XXH_FORCE_INLINE will essentially enforce inlining in all circumstances.

I don't remember well why static was preferred, but it's likely because inlining tends to increase binary size. As a side effect, it also tends to increase the space taken in the instruction cache, which can be bad for performance. Meaning, inlining is not a sure shot when it comes to performance, and that's why it can be best to leave this decision, which can vary depending on target, to the appreciation of the compiler.
Rather, what we want is a static that becomes XXH_FORCE_INLINE when XXH_INLINE_ALL is set.

We may be lacking this level of refinement in the source code.
So that would be something to add,
and that's actually a more involved change.

@Cyan4973 Cyan4973 marked this pull request as draft December 13, 2023 04:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: XXH_INLINE_ALL does not always force inline? How to force inline?
2 participants