Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpich not configuring with clang compilers on Perlmutter #6954

Closed
rgayatri23 opened this issue Mar 29, 2024 · 30 comments
Closed

mpich not configuring with clang compilers on Perlmutter #6954

rgayatri23 opened this issue Mar 29, 2024 · 30 comments

Comments

@rgayatri23
Copy link

I am trying to install mpich/4.2.0 with clang/18.0.1 on Perlmutter and get the error that clang as the C compiler does not work:

checking build system type... x86_64-pc-linux-gnu                                                                                                          checking host system type... x86_64-pc-linux-gnu
checking target system type... x86_64-pc-linux-gnu
checking for gcc... clang
checking whether the C compiler works... no
configure: error: in `/pscratch/sd/r/rgayatri/prgenv/mpich.gF7iSRC/mpich-4.2.0':
configure: error: C compiler cannot create executables
See `config.log' for more details

Here is my configure line

./configure --prefix=$install_prefix --enable-fast=O2 --with-pm=no --with-pmi=cray --with-xpmem=$path-to-xpmem --with-wrapper-dl-type=rpath --enable-threads=multiple --enable-shared=yes --enable-static=no --with-namepublisher=file --with-libfabric=$path-to-libfabric --with-device=ch4:ofi  CC=clang CXX=clang++ NVCC=clang++ NVCC_FLAGS=-allow-unsupported-compiler --disable-fortran CPPFLAGS=-I$path-to-pmi/include 'LDFLAGS=-L$path-to-pmi/lib -L$path-to-libfabric/lib64 -fPIE' 'LIBS=-lpmi -lpmi2 -lfabric' MPICHLIB_CFLAGS=-fPIC MPICHLIB_CXXFLAGS=-fPIC

I also tested this with nvcc as the CUDA compiler and still get the same error.

Am I missing any particular flag to allow this configuration?

@raffenet
Copy link
Contributor

Can you share the config.log file from the build directory?

@rgayatri23
Copy link
Author

Here it is.
config.log

@hzhou
Copy link
Contributor

hzhou commented Mar 30, 2024

configure:5800: checking for C compiler version
configure:5809: clang --version >&5
./configure: line 5811: clang: command not found
configure:5820: $? = 127

clang not in PATH.

@rgayatri23
Copy link
Author

Oh sorry. The clang install was deleted. I should have checked the log before raising the issue. Thanks for the help.

@rgayatri23 rgayatri23 reopened this Apr 1, 2024
@rgayatri23
Copy link
Author

Sorry I closed the issue a bit early. I had a different issue earlier too where the configure did not accept clang as the cuda compiler

checking cuda_runtime_api.h usability... yes
checking cuda_runtime_api.h presence... yes
checking for cuda_runtime_api.h... yes
checking for cudaStreamSynchronize in -lcudart... yes
configure: WARNING: Using user-provided nvcc: 'clang++'
checking whether nvcc works... no
configure: error: CUDA was requested but it is not functional
configure: error: YAKSA configure failed

FYI - The configure command worked with llvm/16 but not with llvm/17 and higher.
I am attaching the current config.log. Could you please take a look.
config.log

@raffenet
Copy link
Contributor

raffenet commented Apr 1, 2024

configure:52522: error: YAKSA configure failed

Could you also send the config.log file from modules/yaksa in your build directory?

@rgayatri23
Copy link
Author

I did some digging into the modules/yaksa/config.log and found the following issue:

configure:17946: WARNING: Using user-provided nvcc: 'clang++'
configure:17961: checking whether nvcc works
configure:17974: clang++ -c conftest.cu >&5
clang++: warning: CUDA version is newer than the latest partially supported version 12.1 [-Wunknown-cuda-version]
clang++: error: GPU arch sm_35 is supported by CUDA versions between 7.0 and 11.8 (inclusive), but installation at /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2 is ; use '--cuda-path' to specify a different CUDA install, pass a different GPU arch with '--cuda-gpu-arch', or pass '--no-cuda-version-check'      configure:17974: $? = 1
configure: failed program was:

I tried the other options of passing the cuda information but those flags were not accepted.
Not sure if this is the main issue.

Attaching the respective config.log
config_yaksa.log

@hzhou
Copy link
Contributor

hzhou commented Apr 1, 2024

@rgayatri23 That looks like the main issue. Let us know if you can resolve it on your own.

@raffenet
Copy link
Contributor

raffenet commented Apr 1, 2024

I tried the other options of passing the cuda information but those flags were not accepted.
Not sure if this is the main issue.

Yaksa tries to detect the NVIDIA GPU at configure time in order to select the right code generation flags. Is there a GPU in your build node? What type is it?

@rgayatri23
Copy link
Author

@raffenet - Yes there is a GPU in my build node. It is NVIDIA A100 (so sm_80) with cudatoolkit/12.2

@raffenet
Copy link
Contributor

raffenet commented Apr 2, 2024

@raffenet - Yes there is a GPU in my build node. It is NVIDIA A100 (so sm_80) with cudatoolkit/12.2

Try adding --with-cuda-sm=80 to your configure line. Detection might not be working correctly with the clang compiler.

@rgayatri23
Copy link
Author

rgayatri23 commented Apr 2, 2024

@raffenet - That did not resolve the issue. It still fails with the same error.

I also tried passing the option --cuda-gpu-arch=80. But configure does not seem to accept it as a valid argument. This is from what I found in the error message in modules/yaksa/config.log

@raffenet
Copy link
Contributor

raffenet commented Apr 2, 2024

OK I just realized that NVCC_FLAGS are ignored when NVCC is set in the configuration environment. Instead of putting the suggested settings in NVCC_FLAGS, can you try and put them together in NVCC? For example:

NVCC="clang++ --cuda-gpu-arch=sm_80"

@rgayatri23
Copy link
Author

It looks like all these are unacceptable options

configure: WARNING: unrecognized options: --with-craypmi, --with-cuda-sm
configure: error: unrecognized option: `--cuda-gpu-arch=sm_80'

raffenet added a commit to raffenet/yaksa that referenced this issue Apr 2, 2024
These flags were ignored when the user specified a compiler other than
the nvcc included in the CUDA installation. Make sure to include them
for consistency. See pmodels/mpich#6954.
raffenet added a commit to raffenet/yaksa that referenced this issue Apr 2, 2024
These flags were ignored when the user specified a compiler other than
the nvcc included in the CUDA installation. Make sure to include them
for consistency. See pmodels/mpich#6954.

Signed-off-by: Ken Raffenetti <raffenet@mcs.anl.gov>
@raffenet
Copy link
Contributor

raffenet commented Apr 2, 2024

It looks like all these are unacceptable options

configure: WARNING: unrecognized options: --with-craypmi, --with-cuda-sm

--with-craypmi is no longer supported in 4.2.0. You should use --with-pmi=pmi2 --with-pmi2=<path/to/cray/pmi> to achieve the same functionality.

configure: error: unrecognized option: `--cuda-gpu-arch=sm_80'

What is your configure line and what version of clang are you using? The config.log you shared suggested --cuda-gpu-arch was the right option, so I'm not sure what's going on.

@rgayatri23
Copy link
Author

Thanks for the info on pmi options.

I am using clang/18.0.1 with mpich/4.2. Here is my configure line:

./configure --prefix= --enable-fast=O2 --with-pm=no --with-pmi=pmi2 --with-pmi2=<path-to-cray-pmi> --with-xpmem=<path-to-xpmem> --with-wrapper-dl-type=rpath --enable-threads=multiple --enable-shared=yes --enable-static=no --with-namepublisher=file --with-libfabric=<path-to-libfabric> --with-libfabric-include=<path-to-libfabric-include> --with-libfabric-lib=<path-to-libfabric-lib>--with-device=ch4:ofi --with-ch4-shmmods=posix,xpmem --enable-thread-cs=per-vci --with-cuda=<path-to-cuda> CPPFLAGS=-I<path-to-pmi-include> CC=clang CFLAGS= NVCC=clang++ --with-cuda-sm=80 NVCC_FLAGS=-allow-unsupported-compiler CXX=clang++ FC= FCFLAGS= F77= FFLAGS= 'LIBS=-lpmi -lpmi2 -Wl,--as-needed,-lcudart,--no-as-needed -lcuda' 'LDFLAGS=-L<path-to-pmi-lib> -L<path-to-libfabric-lib64> -L<path-to-cuda-lib64> MPICHLIB_CFLAGS=-fPIC MPICHLIB_CXXFLAGS=-fPIC MPICHLIB_FFLAGS=-fPIC MPICHLIB_FCFLAGS=-fPIC
configure: WARNING: unrecognized options: --with-cuda-sm

I am attaching the config.log for the latest without using the --cuda-gpu-arch option since that fails with the error as shown above.
config_yaksa.log

@raffenet
Copy link
Contributor

raffenet commented Apr 2, 2024

You need to add quotes around the full NVCC setting or else the part after the space won't be recognized.

NVCC="clang++ --cuda-gpu-arch=sm_80"

@rgayatri23
Copy link
Author

rgayatri23 commented Apr 2, 2024

Here is the raw configure line. It is passing clang++ --cuda-gpu-arch=sm_80 as a single string right ?'NVCC="clang++' '--cuda-gpu-arch=sm_80"'

./configure --prefix= --enable-fast=O2 --with-pm=no --with-pmi=pmi2 --with-pmi2=/opt/cray/pe/pmi/default --with-xpmem=/opt/cray/xpmem/default --with-wrapper-dl-type=rpath --enable-threads=multiple --enable-shared=yes --enable-static=no --with-namepublisher=file --with-libfabric=/opt/cray/libfabric/1.15.2.0 --with-libfabric-include=/opt/cray/libfabric/1.15.2.0/include --with-libfabric-lib=/opt/cray/libfabric/1.15.2.0/lib64 --with-device=ch4:ofi --with-ch4-shmmods=posix,xpmem --enable-thread-cs=per-vci --with-cuda=/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2 CPPFLAGS=-I/opt/cray/pe/pmi/default/include CC=clang CFLAGS= 'NVCC="clang++' '--cuda-gpu-arch=sm_80"' NVCC_FLAGS=-allow-unsupported-compiler CXX=clang++ FC= FCFLAGS= F77= FFLAGS= 'LIBS=-lpmi -lpmi2 -Wl,--as-needed,-lcudart,--no-as-needed -lcuda' 'LDFLAGS=-L/opt/cray/pe/pmi/default/lib -L/opt/cray/libfabric/1.15.2.0/lib64 -L/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/lib64' MPICHLIB_CFLAGS=-fPIC MPICHLIB_CXXFLAGS=-fPIC MPICHLIB_FFLAGS=-fPIC MPICHLIB_FCFLAGS=-fPIC
configure: error: unrecognized option: `--cuda-gpu-arch=sm_80"'
Try `./configure --help' for more information

@raffenet
Copy link
Contributor

raffenet commented Apr 2, 2024

What are these single quotes in the configure line? Can you remove them?

'NVCC="clang++' '--cuda-gpu-arch=sm_80"'

@rgayatri23
Copy link
Author

Ok thanks. It's weird, I was doing the configuration through a bash script and it did not do what I thought it did. Thanks for pointing it out.
I tested this manually and this time the options were accepted and the configuration went ahead but it failed at the following error:

In file included from /usr/lib64/gcc/x86_64-suse-linux/12/../../../../include/c++/12/cmath:47:
/usr/lib64/gcc/x86_64-suse-linux/12/../../../../include/c++/12/bits/std_abs.h:103:7: error: __float128 is not supported on this target
  103 |   abs(__float128 __x)
      |       ^
/usr/lib64/gcc/x86_64-suse-linux/12/../../../../include/c++/12/bits/std_abs.h:102:3: error: __float128 is not supported on this target
  102 |   __float128
      |   ^
/usr/lib64/gcc/x86_64-suse-linux/12/../../../../include/c++/12/bits/std_abs.h:103:18: note: '__x' defined here
  103 |   abs(__float128 __x)

@raffenet
Copy link
Contributor

raffenet commented Apr 2, 2024

This is what I saw on my system as well. This issue suggested adding -D__STRICT_ANSI__ to NVCC/NVCC_FLAGS, which worked for me.

@raffenet
Copy link
Contributor

raffenet commented Apr 2, 2024

This is what I saw on my system as well. This issue suggested adding -D__STRICT_ANSI__ to NVCC/NVCC_FLAGS, which worked for me.

Even though this gets past configure, it's likely going to fail at the make stage because the build system assumes nvcc format for the device code generation options. Supporting clang++ is going to take some additional changes.

@rgayatri23
Copy link
Author

  1. Is there any plan to support clang as the cuda compiler?
  2. Also I noticed that even nvcc as the cuda compiler fails with similar errors. Do I need some additional options to make it work?

@raffenet
Copy link
Contributor

raffenet commented Apr 3, 2024

  1. Is there any plan to support clang as the cuda compiler?

No timeline to supporting this configuration at the moment.

  1. Also I noticed that even nvcc as the cuda compiler fails with similar errors. Do I need some additional options to make it work?

On the Polaris system here at ANL, I am able to build using nvcc + the system gcc (7.5.0) and also the NVIDIA HPC compilers (CC=nvc CXX=nvc++ FC=nvfortran) with no additional options.

@rgayatri23
Copy link
Author

No timeline to supporting this configuration at the moment.

Ok. But thanks, I see a few feature requests and patches based on this issue :-)

On the Polaris system here at ANL, I am able to build using nvcc + the system gcc (7.5.0) and also the NVIDIA HPC compilers (CC=nvc CXX=nvc++ FC=nvfortran) with no additional options.

I was actually thinking in terms of CC=clang CXX=clang++ NVCC=nvcc(maybe not needed) and disable fortran

@raffenet
Copy link
Contributor

raffenet commented Apr 3, 2024

The reason it doesn't work has to do with the fact that we override the host compiler for nvcc during configuration. Removing the override allows me to progress through the yaksa build, so we need to understand why it is there in the first place...stay tuned for now.

@raffenet
Copy link
Contributor

raffenet commented Apr 3, 2024

You should be able to workaround the host compiler issue by doing this in configure.

--with-cuda=/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2 NVCC=/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/bin/nvcc

The configure script will see that you've supplied an NVCC and skip the host compiler override. I have tested it successfully on Polaris.

@rgayatri23
Copy link
Author

You should be able to workaround the host compiler issue by doing this in configure.

--with-cuda=/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2 NVCC=/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/bin/nvcc

The configure script will see that you've supplied an NVCC and skip the host compiler override. I have tested it successfully on Polaris.

Thanks this worked. I think this is what I will do for now.

@rgayatri23
Copy link
Author

You can close this issue for now. But it would be great if I could be notified whenever its possible to use clang++ as the CUDA compiler too. Is there an issue which I can track for that update?

@raffenet
Copy link
Contributor

raffenet commented May 3, 2024

@rgayatri23 see pmodels/yaksa#251 for tracking supporting clang++ as the CUDA compiler.

@raffenet raffenet closed this as completed May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants