Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter out SSE usage for non-x86_64. #1147

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

terjekv
Copy link

@terjekv terjekv commented Sep 28, 2020

  • Symbols are still maintained as they are used
    in a mnumber of places.
  • SSE should be set to 0 for non-x86_x64 so the
    code paths expecting SSE should never be called.

(Hopefully) fixes #475

  - Symbols are still maintained as they are used
    in a mnumber of places.
  - SSE should be set to 0 for non-x86_x64 so the
    code paths expecting SSE should never be called.
@terjekv
Copy link
Author

terjekv commented Sep 29, 2020

I could not find a test suite for verifying the functionality of this PR. We may also want to error on using the SEE-methods on non-SSE architectures to avoid silent failures.

However, the correct solution is probably to use https://github.com/DLTcollab/sse2neon as a drop-in replacement for emmintrin.h to utilise Arm NEON.

@benjjneb
Copy link
Owner

benjjneb commented Oct 4, 2020

First, thank you for this PR. It is directly addressing a problem I don't have the expertise to address.

I'll also be honest. I am nervous about pulling in code that I cannot really test.

It's not that this code can't be tested, it can, but I don't at this point know how to test whether this fixes the ARM issues, and I definitely don't know how to test whether it makes things work on ARM under different user options (e.g. multithread=T or F).

Thoughts? One possibility would be creating an ARM branch for now, with the expectation that as people use it and user experiences confirm (or not) that it works that it could be pulled into the main branch later.

@terjekv
Copy link
Author

terjekv commented Oct 5, 2020

I think there are two things here that should be looked at.

  1. Making the required code changes for supporting aarch64/arm64. To limit in-house fixing and porting, using https://github.com/DLTcollab/sse2neon seems like a very good path forward. It is a well-supported piece of code that is used for this specific purpose in a number of projects. If using sse2neon works as expected on x86_64, odds are very good that the same code, based on SSE primitives, will work on arm.

  2. Once the required code changes are available, a test suite that verifies that results are what they are supposed to be.

As for testing on aarch64/arm64, that is hard to do without access to the platform. However, usage of arm64 is growing in the HPC world, and as dada2 is shipped as part of https://github.com/easybuilders/easybuild-easyconfigs/tree/develop/easybuild/easyconfigs/r/R-bundle-Bioconductor under the EasyBuild umbrella, finding testers should be quite doable. EasyBuild has a number of users who have HPC clusters on aarch64/arm64.

As for maintaining a secondary branch, that is a development choice that may or may not be a good idea. My main concern is that the branch becomes stale and diverges enough that it in practice will have limited value. Working to merge something like sse2neon should be drop-in, even on x86_64, and thus limit the amount of overhead involved.

@cjfields
Copy link

@benjjneb @terjekv I'd be happy to assist in testing this locally using Apple M1, and I believe we may have some access to arm64 nodes on our HPC. I guess the question is how you want tests run, and whether there are some points like multithreading not covered under any CI-based testing that need checking.

@benjjneb
Copy link
Owner

I've been expecting to replace my laptop this year with a new M1 Macbook, at which point I'll easily be able to test and (probably) pull this change in. Thanks for the offer though @cjfields , and I may take you up on it if it becomes clear that I won't have my hands on an M1 laptop for a while. There are a couple of specific performance things I'd like to inspect specific to the vectorized parts of the code that definitely aren't in the CI testing right now.

@cjfields
Copy link

@benjjneb sure, no worries!

@jianshu93
Copy link

Hello All,

I still have the error for M1 Mac using the sse2 filtered branch (remotes::install_github("terjekv/dada2")):

  • installing source package ‘dada2’ ...
    ** using staged installation
    ** libs
    clang++ -arch arm64 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I'/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/library/RcppParallel/include' -I/opt/R/arm64/include -fPIC -falign-functions=64 -Wall -g -O2 -c RcppExports.cpp -o RcppExports.o
    In file included from RcppExports.cpp:4:
    In file included from ./../inst/include/dada2.h:7:
    ./../inst/include/dada2_RcppExports.h:14:14: warning: unused function 'validateSignature' [-Wunused-function]
    void validateSignature(const char* sig) {
    ^
    1 warning generated.
    clang++ -arch arm64 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I'/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/library/RcppParallel/include' -I/opt/R/arm64/include -fPIC -falign-functions=64 -Wall -g -O2 -c Rmain.cpp -o Rmain.o
    In file included from Rmain.cpp:7:
    /Library/Developer/CommandLineTools/usr/lib/clang/13.0.0/include/cpuid.h:14:2: error: this header is for x86 only
    #error this header is for x86 only
    ^
    /Library/Developer/CommandLineTools/usr/lib/clang/13.0.0/include/cpuid.h:286:5: error: invalid output constraint '=a' in asm
    __cpuid(__leaf, __eax, __ebx, __ecx, __edx);
    ^
    /Library/Developer/CommandLineTools/usr/lib/clang/13.0.0/include/cpuid.h:251:11: note: expanded from macro '__cpuid'
    : "=a"(__eax), "=r" (__ebx), "=c"(__ecx), "=d"(__edx)
    ^
    /Library/Developer/CommandLineTools/usr/lib/clang/13.0.0/include/cpuid.h:301:5: error: invalid output constraint '=a' in asm
    __cpuid(__leaf, *__eax, *__ebx, *__ecx, *__edx);
    ^
    /Library/Developer/CommandLineTools/usr/lib/clang/13.0.0/include/cpuid.h:251:11: note: expanded from macro '__cpuid'
    : "=a"(__eax), "=r" (__ebx), "=c"(__ecx), "=d"(__edx)
    ^
    /Library/Developer/CommandLineTools/usr/lib/clang/13.0.0/include/cpuid.h:315:5: error: invalid output constraint '=a' in asm
    __cpuid_count(__leaf, __subleaf, *__eax, *__ebx, *__ecx, *__edx);
    ^
    /Library/Developer/CommandLineTools/usr/lib/clang/13.0.0/include/cpuid.h:258:11: note: expanded from macro '__cpuid_count'
    : "=a"(__eax), "=r" (__ebx), "=c"(__ecx), "=d"(__edx)
    ^
    Rmain.cpp:9:3: error: invalid output constraint '=a' in asm
    __cpuid_count(InfoType, 0, info[0], info[1], info[2], info[3]);
    ^
    /Library/Developer/CommandLineTools/usr/lib/clang/13.0.0/include/cpuid.h:258:11: note: expanded from macro '__cpuid_count'
    : "=a"(__eax), "=r" (__ebx), "=c"(__ecx), "=d"(__edx)
    ^
    5 errors generated.
    make: *** [Rmain.o] Error 1
    ERROR: compilation failed for package ‘dada2’
  • removing ‘/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/library/dada2’
    Warning message:
    In i.p(...) :
    installation of package ‘/var/folders/mr/8qybn91j5fgdbrcdmq7ckftw0000gn/T//RtmpCWnuTf/file9294636259ae/dada2_1.17.3.tar.gz’ had non-zero exit status

It looks like the Rmain.cpp line 7, void cpuid is still the problem. Any idea?

Thanks,

Jianshu

@terjekv
Copy link
Author

terjekv commented Dec 2, 2021

@jianshu93 Based on https://opensource.apple.com/source/WTF/WTF-7601.1.46.42/wtf/Platform.h.auto.html we might need to check for defined(__APPLE__) or its friends to cover Apple-arm. I'll see if I can look at this tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Problems building on ARM based servers
4 participants