Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

macOS top-half avx / avx2 registers aren't handled correctly #249

Open
milianw opened this issue Mar 16, 2020 · 3 comments
Open

macOS top-half avx / avx2 registers aren't handled correctly #249

milianw opened this issue Mar 16, 2020 · 3 comments

Comments

@milianw
Copy link

milianw commented Mar 16, 2020

Vc version / revision Operating System Compiler & Version Compiler Flags Assembler & Version CPU
1.4 macOS 10.15.3 Apple clang version 11.0.0 (clang-1100.0.33.17) avx / avx2 Apple clang version 11.0.0 (clang-1100.0.33.17) 2,6GHz Dual-Core Intel Core i5 from MacBook Pro Retina, 13-inch, Mid 2014

In a project of ours that is using Vc 1.4 a colleague reported strange issues: The images we produce using Vc are striped, with the first four pixels looking good and then the following four pixels being black. Turns out, the code that's fine on most platforms and even on another macOS machine with a different CPU, behaves very wrong. I believe the existing test suite of Vc also shows that issue, see below.

Is this a macOS/clang bug, or something within Vc? Any way to workaround this?

Testcase

arithmetics_avx:
27:  FAIL: ┍ at /Users/kdab/Vc/tests/arithmetics.cpp:231 (0x106d4beb4)):
27:  FAIL: │ test * test - test ([16797700, 16797700, 16797700, 16797700, 16797700, 16797700, 16797700, 16797700]) ≈ Vec(j * j - j) ([16797702, 16797702, 16797702, 16797702, 16797702, 16797702, 16797702, 16797702]) -> m[0000 0000]
27:  FAIL: │ distance: [-1, -1, -1, -1, -3.35544e+07, -3.35544e+07, -3.35544e+07, -3.35544e+07] ulp, allowed distance: ±1 ulp 
27:  FAIL: ┕ testMulSub<simd< float, AVX>>

arithmetics_avx2:
28:  FAIL: ┍ at /Users/kdab/Vc/tests/arithmetics.cpp:231 (0x10ca05724)):
28:  FAIL: │ test * test - test ([16797700, 16797700, 16797700, 16797700, 16797700, 16797700, 16797700, 16797700]) ≈ Vec(j * j - j) ([16797702, 16797702, 16797702, 16797702, 16797702, 16797702, 16797702, 16797702]) -> m[0000 0000]
28:  FAIL: │ distance: [-1, -1, -1, -1, -3.35544e+07, -3.35544e+07, -3.35544e+07, -3.35544e+07] ulp, allowed distance: ±1 ulp 
28:  FAIL: ┕ testMulSub<simd< float, AVX>>

Actual Results

The following tests FAILED:
	 27 - arithmetics_avx (Failed)
	 28 - arithmetics_avx2 (Failed)
	 59 - ulp_avx (Failed)
	 60 - ulp_avx2 (Failed)
	 71 - math_avx (Failed)
	 72 - math_avx2 (Failed)
	103 - gatherinterleavedmemory_avx (Failed)
	104 - gatherinterleavedmemory_avx2 (Failed)```

Expected Results

no failures

@0x6e
Copy link

0x6e commented Mar 16, 2020

Vc version / revision Operating System Compiler & Version Compiler Flags Assembler & Version CPU
1.4 macOS 10.15.3 Apple clang version 11.0.0 (clang-1100.0.33.17) avx / avx2 Apple clang version 11.0.0 (clang-1100.0.33.17) 4.01 GHz Quad-Core Intel Core i7 6700K

Here is the complete test run output from my failing machine: test-mac.log

@0x6e
Copy link

0x6e commented Mar 16, 2020

My MacBook Pro has no issues, all tests pass. Note the software stack is the same:

Vc version / revision Operating System Compiler & Version Compiler Flags Assembler & Version CPU
1.4 macOS 10.15.3 Apple clang version 11.0.0 (clang-1100.0.33.17) avx / avx2 Apple clang version 11.0.0 (clang-1100.0.33.17) MacBook Pro (Retina, 15-inch, Mid 2015) 2.5 GHz Intel Core i7

test-mbp.log

@0x6e
Copy link

0x6e commented Mar 16, 2020

Note that on my failing machine, if I use gcc all tests pass:

Vc version / revision Operating System Compiler & Version Compiler Flags Assembler & Version CPU
1.4 macOS 10.15.3 g++-9 (Homebrew GCC 9.2.0_3) 9.2.0 avx / avx2 g++-9 (Homebrew GCC 9.2.0_3) 9.2.0 4.01 GHz Quad-Core Intel Core i7 6700K

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants