Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[build/nccl] failed to build libnccl on Debian unstable #675

Closed
cdluminate opened this issue Feb 2, 2017 · 3 comments
Closed

[build/nccl] failed to build libnccl on Debian unstable #675

cdluminate opened this issue Feb 2, 2017 · 3 comments

Comments

@cdluminate
Copy link
Contributor

Failed to build CUDA version of pytorch (without CUDNN) with the latest source.

OS: debian unstable/experimental
Compiler: gcc-5, g++-5
CUDA: 8.0.44 (package provided by Debian)

buildlog: http://debomatic-amd64.debian.net/distribution#experimental/pytorch-contrib/0.1.7~1/buildlog

-- The C compiler identification is GNU 5.4.1
-- The CXX compiler identification is GNU 5.4.1
-- Check for working C compiler: /usr/bin/gcc-5
-- Check for working C compiler: /usr/bin/gcc-5 -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/g++-5
-- Check for working CXX compiler: /usr/bin/g++-5 -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found CUDA: /usr (found suitable version "8.0", minimum required is "7.0") 
-- Configuring done
-- Generating done
-- Build files have been written to: /<<PKGBUILDDIR>>/torch/lib/build/nccl
make[2]: Entering directory '/<<PKGBUILDDIR>>/torch/lib/build/nccl'
make[3]: Entering directory '/<<PKGBUILDDIR>>/torch/lib/build/nccl'
make[4]: Entering directory '/<<PKGBUILDDIR>>/torch/lib/build/nccl'
Scanning dependencies of target nccl
make[4]: Leaving directory '/<<PKGBUILDDIR>>/torch/lib/build/nccl'
make[4]: Entering directory '/<<PKGBUILDDIR>>/torch/lib/build/nccl'
[100%] Generating lib/libnccl.so
make[5]: Entering directory '/<<PKGBUILDDIR>>/torch/lib/nccl'
ls: cannot access '/usr/lib64/libcudart.so.*': No such file or directory
ls: cannot access '/usr/lib64/libcudart.so.*': No such file or directory
Grabbing  src/nccl.h                > /<<PKGBUILDDIR>>/torch/lib/build/nccl/include/nccl.h
Compiling src/libwrap.cu            > /<<PKGBUILDDIR>>/torch/lib/build/nccl/obj/libwrap.o
Compiling src/core.cu               > /<<PKGBUILDDIR>>/torch/lib/build/nccl/obj/core.o
Compiling src/all_gather.cu         > /<<PKGBUILDDIR>>/torch/lib/build/nccl/obj/all_gather.o
Compiling src/all_reduce.cu         > /<<PKGBUILDDIR>>/torch/lib/build/nccl/obj/all_reduce.o
Compiling src/broadcast.cu          > /<<PKGBUILDDIR>>/torch/lib/build/nccl/obj/broadcast.o
Compiling src/reduce.cu             > /<<PKGBUILDDIR>>/torch/lib/build/nccl/obj/reduce.o
Compiling src/reduce_scatter.cu     > /<<PKGBUILDDIR>>/torch/lib/build/nccl/obj/reduce_scatter.o
src/core.cu(724): error: expected an expression

src/core.cu(724): error: expected an expression

2 errors detected in the compilation of "/tmp/tmpxft_00002c02_00000000-13_core.compute_52.cpp1.ii".
Makefile:98: recipe for target '/<<PKGBUILDDIR>>/torch/lib/build/nccl/obj/core.o' failed
make[5]: *** [/<<PKGBUILDDIR>>/torch/lib/build/nccl/obj/core.o] Error 2

I have no idea about this ...

@apaszke
Copy link
Contributor

apaszke commented Feb 2, 2017

If you look here you'll see that it fails, because ls can't find libcudart.so, and that's why CUDA_MAJOR and CUDA_MINOR end up being empty strings. I don't know how a debian CUDA package looks, but you probably don't have libcudart.so with a version extension.

@apaszke
Copy link
Contributor

apaszke commented Feb 2, 2017

Maybe CUDA_VERSION=8.0 pp setup.py build helps?

@soumith soumith closed this as completed Feb 3, 2017
@cdluminate
Copy link
Contributor Author

@apaszke Thanks, the fix is to to export the two environment variables:

export CUDA_HOME=/usr
export CUDA_LIB=/usr/lib/$(shell dpkg-architecture -qDEB_HOST_MULTIARCH)

mcarilli pushed a commit to mcarilli/pytorch that referenced this issue Mar 18, 2021
)

* Enable AdvancedLowering3

It probably still needs to be completed as a test for the lowering
passes.

* Expand error message

* Relax the mapping constraint when an axis is never concretized.

A broadcast axis that never gets concretized does not create any actual
loop, so there is nothing that precludes it to be mapped with any other
axis.
KyleCZH pushed a commit to KyleCZH/pytorch that referenced this issue Sep 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants