Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requesting clarification regarding the use of __GNUC__ in xxhash code #709

Open
rohanaggarwal7997 opened this issue May 6, 2022 · 5 comments
Labels

Comments

@rohanaggarwal7997
Copy link

rohanaggarwal7997 commented May 6, 2022

I had a small question regarding the xxhash code. There is usage of __GNUC__ in some places in the code(in footnote) to get the performance of the compiled code by GCC similar to clang . My question is that even ICC defines __GNUC__. Was the code benchmarked for the performance considerations with ICC ? or should we be using an extra !defined(__INTEL_COMPILER) .

I am talking about the following two comments in your code

XXH_FORCE_INLINE xxh_u64 XXH3_mix16B(const xxh_u8* XXH_RESTRICT input,
                                     const xxh_u8* XXH_RESTRICT secret, xxh_u64 seed64)
{
#if defined(__GNUC__) && !defined(__clang__) /* GCC, not Clang */ \
  && defined(__i386__) && defined(__SSE2__)  /* x86 + SSE2 */ \
  && !defined(XXH_ENABLE_AUTOVECTORIZE)      /* Define to disable like XXH32 hack */
    /*
     * UGLY HACK:
     * GCC for x86 tends to autovectorize the 128-bit multiply, resulting in
     * slower code.
     *
     * By forcing seed64 into a register, we disrupt the cost model and
     * cause it to scalarize. See `XXH32_round()`
     *
     * FIXME: Clang's output is still _much_ faster -- On an AMD Ryzen 3600,
     * XXH3_64bits @ len=240 runs at 4.6 GB/s with Clang 9, but 3.3 GB/s on
     * GCC 9.2, despite both emitting scalar code.
     *
     * GCC generates much better scalar code than Clang for the rest of XXH3,
     * which is why finding a more optimal codepath is an interest.
     */
    XXH_COMPILER_GUARD(seed64);
/*
 * UGLY HACK:
 * GCC usually generates the best code with -O3 for xxHash.
 *
 * However, when targeting AVX2, it is overzealous in its unrolling resulting
 * in code roughly 3/4 the speed of Clang.
 *
 * There are other issues, such as GCC splitting _mm256_loadu_si256 into
 * _mm_loadu_si128 + _mm256_inserti128_si256. This is an optimization which
 * only applies to Sandy and Ivy Bridge... which don't even support AVX2.
 *
 * That is why when compiling the AVX2 version, it is recommended to use either
 *   -O2 -mavx2 -march=haswell
 * or
 *   -O2 -mavx2 -mno-avx256-split-unaligned-load
 * for decent performance, or to use Clang instead.
 *
 * Fortunately, we can control the first one with a pragma that forces GCC into
 * -O2, but the other one we can't control without "failed to inline always
 * inline function due to target mismatch" warnings.
 */
@Cyan4973
Copy link
Owner

Cyan4973 commented May 6, 2022

I would be surprised if this code performance hack was checked on ICC,
though with godbolt, that might actually be possible.
cc @easyaspi314 , the author of this code.

@easyaspi314
Copy link
Contributor

Yeah those should be for GCC only.

It is really annoying that in order to check for actual GCC you have to check for the imposters.

How about just adding something like this

/* Attempts to detect *actual* GCC. */
#define XXH_IS_GCC (defined(__GNUC__) && !defined(__clang__) && !defined(__INTEL_COMPILER))
/* Detects GCC and Clang (in clang-cl mode, clang does not define __GNUC__) */
#define XXH_IS_GNU (defined(__GNUC__) || defined(__clang__))

Just like how XXH_GCC_VERSION is used.

@Cyan4973
Copy link
Owner

Cyan4973 commented May 6, 2022

I like it

@GitMensch
Copy link

So PR missing?

@Cyan4973
Copy link
Owner

Coming back to this topic,
it seems there are still a few places in the code base where the following preprocessor test is present :
#if defined(__GNUC__) && !defined(__clang__)

I presume it's the kind of test where excluding icc would (presumably) be useful too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants