Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test fails with gcc 10.2.0 #94

Open
thrasibule opened this issue Sep 15, 2020 · 3 comments
Open

Test fails with gcc 10.2.0 #94

thrasibule opened this issue Sep 15, 2020 · 3 comments

Comments

@thrasibule
Copy link

The test fail for me on my machine with:

Mismatch at size 33 for target Portable.

This is with gcc 10.2, see full version:

gcc -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --with-isl --with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-install-libiberty --enable-linker-build-id --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch --disable-libunwind-exceptions --disable-werror gdc_include_dir=/usr/include/dlang/gdc
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.2.0 (GCC)

If I compile with clang, the test passes, but only tries SSE41, not AVX2 even though my cpu supports it.

bin/highwayhash_test 
  Portable: OK
     SSE41: OK
  PortableCat: OK
     SSE41Cat: OK

Any idea how to debug this?

@thrasibule
Copy link
Author

thrasibule commented Sep 19, 2020

I've made some progress on this.

  • test passes with clang. The fact that it wasn't trying avx2 was because I was running it in virtuabox, and even though virtualbox lets avx2 flag thought the host, it doesn't let fma and bmi throughwhich are also required.
  • test passes with gcc if I use the -O2 flag instead of -O3. So it must be some over agressive optimization somewhere
  • unfortunately, I've tried to run with -O2 -ftree-loop-vectorize -fversion-loops-for-strides -fipa-cp-clone -fgcse-after-reload -floop-interchange -floop-unroll-and-jam -fpeel-loops -fpredictive-commoning -fsplit-loops -fsplit-paths -ftree-loop-distribution -ftree-partial-pre -ftree-slp-vectorize -funswitch-loops -fvect-cost-model -fvect-cost-model=dynamic which to my understanding should be quivalent to -O3 but the test still passes, so there is something else that I'm missing.

@jan-wassenberg
Copy link
Member

Thanks for letting us know and sharing your results. It does sound like a compiler bug, which is unfortunately not uncommon in my experience.

avx2 was because I was running it in virtuabox, and even though virtualbox lets avx2 flag thought the host, it doesn't let fma and bmi throughwhich are also required.

Yes, this is unfortunate. We also ran into that with JPEG XL. There the FMA is helpful but for HighwayHash we could remove those extra flag requirements.

Interesting that -O3 is not the same as its constituent flags (at least as defined at https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html - I verified it's the same list). Would you like to report this as a potential a bug to GCC?

@thrasibule
Copy link
Author

I've checked that it also works with -O3 -fno-strict-aliasing, but I don't get any aliasing warnings, so it must be something quite subtle. Does it ring a bell? It also works with gcc 9.3. I'm a bit wary to report it to gcc without narrowing down the issue further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants