mpich not configuring with clang compilers on Perlmutter #6954

rgayatri23 · 2024-03-29T18:24:09Z

I am trying to install mpich/4.2.0 with clang/18.0.1 on Perlmutter and get the error that clang as the C compiler does not work:

checking build system type... x86_64-pc-linux-gnu                                                                                                          checking host system type... x86_64-pc-linux-gnu
checking target system type... x86_64-pc-linux-gnu
checking for gcc... clang
checking whether the C compiler works... no
configure: error: in `/pscratch/sd/r/rgayatri/prgenv/mpich.gF7iSRC/mpich-4.2.0':
configure: error: C compiler cannot create executables
See `config.log' for more details

Here is my configure line

./configure --prefix=$install_prefix --enable-fast=O2 --with-pm=no --with-pmi=cray --with-xpmem=$path-to-xpmem --with-wrapper-dl-type=rpath --enable-threads=multiple --enable-shared=yes --enable-static=no --with-namepublisher=file --with-libfabric=$path-to-libfabric --with-device=ch4:ofi  CC=clang CXX=clang++ NVCC=clang++ NVCC_FLAGS=-allow-unsupported-compiler --disable-fortran CPPFLAGS=-I$path-to-pmi/include 'LDFLAGS=-L$path-to-pmi/lib -L$path-to-libfabric/lib64 -fPIE' 'LIBS=-lpmi -lpmi2 -lfabric' MPICHLIB_CFLAGS=-fPIC MPICHLIB_CXXFLAGS=-fPIC

I also tested this with nvcc as the CUDA compiler and still get the same error.

Am I missing any particular flag to allow this configuration?

The text was updated successfully, but these errors were encountered:

raffenet · 2024-03-29T20:25:14Z

Can you share the config.log file from the build directory?

rgayatri23 · 2024-03-29T21:44:32Z

Here it is.
config.log

hzhou · 2024-03-30T02:14:46Z

configure:5800: checking for C compiler version
configure:5809: clang --version >&5
./configure: line 5811: clang: command not found
configure:5820: $? = 127

clang not in PATH.

rgayatri23 · 2024-04-01T15:30:56Z

Oh sorry. The clang install was deleted. I should have checked the log before raising the issue. Thanks for the help.

rgayatri23 · 2024-04-01T17:01:09Z

Sorry I closed the issue a bit early. I had a different issue earlier too where the configure did not accept clang as the cuda compiler

checking cuda_runtime_api.h usability... yes
checking cuda_runtime_api.h presence... yes
checking for cuda_runtime_api.h... yes
checking for cudaStreamSynchronize in -lcudart... yes
configure: WARNING: Using user-provided nvcc: 'clang++'
checking whether nvcc works... no
configure: error: CUDA was requested but it is not functional
configure: error: YAKSA configure failed

FYI - The configure command worked with llvm/16 but not with llvm/17 and higher.
I am attaching the current config.log. Could you please take a look.
config.log

raffenet · 2024-04-01T17:22:38Z

configure:52522: error: YAKSA configure failed

Could you also send the config.log file from modules/yaksa in your build directory?

rgayatri23 · 2024-04-01T17:47:42Z

I did some digging into the modules/yaksa/config.log and found the following issue:

configure:17946: WARNING: Using user-provided nvcc: 'clang++'
configure:17961: checking whether nvcc works
configure:17974: clang++ -c conftest.cu >&5
clang++: warning: CUDA version is newer than the latest partially supported version 12.1 [-Wunknown-cuda-version]
clang++: error: GPU arch sm_35 is supported by CUDA versions between 7.0 and 11.8 (inclusive), but installation at /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2 is ; use '--cuda-path' to specify a different CUDA install, pass a different GPU arch with '--cuda-gpu-arch', or pass '--no-cuda-version-check'      configure:17974: $? = 1
configure: failed program was:

I tried the other options of passing the cuda information but those flags were not accepted.
Not sure if this is the main issue.

Attaching the respective config.log
config_yaksa.log

hzhou · 2024-04-01T17:59:37Z

@rgayatri23 That looks like the main issue. Let us know if you can resolve it on your own.

raffenet · 2024-04-01T20:16:49Z

I tried the other options of passing the cuda information but those flags were not accepted.
Not sure if this is the main issue.

Yaksa tries to detect the NVIDIA GPU at configure time in order to select the right code generation flags. Is there a GPU in your build node? What type is it?

rgayatri23 · 2024-04-02T00:03:32Z

@raffenet - Yes there is a GPU in my build node. It is NVIDIA A100 (so sm_80) with cudatoolkit/12.2

raffenet · 2024-04-02T01:42:29Z

@raffenet - Yes there is a GPU in my build node. It is NVIDIA A100 (so sm_80) with cudatoolkit/12.2

Try adding --with-cuda-sm=80 to your configure line. Detection might not be working correctly with the clang compiler.

rgayatri23 · 2024-04-02T16:38:56Z

@raffenet - That did not resolve the issue. It still fails with the same error.

I also tried passing the option --cuda-gpu-arch=80. But configure does not seem to accept it as a valid argument. This is from what I found in the error message in modules/yaksa/config.log

raffenet · 2024-04-02T17:01:34Z

OK I just realized that NVCC_FLAGS are ignored when NVCC is set in the configuration environment. Instead of putting the suggested settings in NVCC_FLAGS, can you try and put them together in NVCC? For example:

NVCC="clang++ --cuda-gpu-arch=sm_80"

rgayatri23 · 2024-04-02T17:36:28Z

It looks like all these are unacceptable options

configure: WARNING: unrecognized options: --with-craypmi, --with-cuda-sm

configure: error: unrecognized option: `--cuda-gpu-arch=sm_80'

These flags were ignored when the user specified a compiler other than the nvcc included in the CUDA installation. Make sure to include them for consistency. See pmodels/mpich#6954.

These flags were ignored when the user specified a compiler other than the nvcc included in the CUDA installation. Make sure to include them for consistency. See pmodels/mpich#6954. Signed-off-by: Ken Raffenetti <raffenet@mcs.anl.gov>

raffenet · 2024-04-02T17:44:56Z

It looks like all these are unacceptable options
configure: WARNING: unrecognized options: --with-craypmi, --with-cuda-sm

--with-craypmi is no longer supported in 4.2.0. You should use --with-pmi=pmi2 --with-pmi2=<path/to/cray/pmi> to achieve the same functionality.

configure: error: unrecognized option: `--cuda-gpu-arch=sm_80'

What is your configure line and what version of clang are you using? The config.log you shared suggested --cuda-gpu-arch was the right option, so I'm not sure what's going on.

rgayatri23 · 2024-04-02T18:43:01Z

Thanks for the info on pmi options.

I am using clang/18.0.1 with mpich/4.2. Here is my configure line:

./configure --prefix= --enable-fast=O2 --with-pm=no --with-pmi=pmi2 --with-pmi2=<path-to-cray-pmi> --with-xpmem=<path-to-xpmem> --with-wrapper-dl-type=rpath --enable-threads=multiple --enable-shared=yes --enable-static=no --with-namepublisher=file --with-libfabric=<path-to-libfabric> --with-libfabric-include=<path-to-libfabric-include> --with-libfabric-lib=<path-to-libfabric-lib>--with-device=ch4:ofi --with-ch4-shmmods=posix,xpmem --enable-thread-cs=per-vci --with-cuda=<path-to-cuda> CPPFLAGS=-I<path-to-pmi-include> CC=clang CFLAGS= NVCC=clang++ --with-cuda-sm=80 NVCC_FLAGS=-allow-unsupported-compiler CXX=clang++ FC= FCFLAGS= F77= FFLAGS= 'LIBS=-lpmi -lpmi2 -Wl,--as-needed,-lcudart,--no-as-needed -lcuda' 'LDFLAGS=-L<path-to-pmi-lib> -L<path-to-libfabric-lib64> -L<path-to-cuda-lib64> MPICHLIB_CFLAGS=-fPIC MPICHLIB_CXXFLAGS=-fPIC MPICHLIB_FFLAGS=-fPIC MPICHLIB_FCFLAGS=-fPIC
configure: WARNING: unrecognized options: --with-cuda-sm

I am attaching the config.log for the latest without using the --cuda-gpu-arch option since that fails with the error as shown above.
config_yaksa.log

raffenet · 2024-04-02T18:57:17Z

You need to add quotes around the full NVCC setting or else the part after the space won't be recognized.

NVCC="clang++ --cuda-gpu-arch=sm_80"

rgayatri23 · 2024-04-02T19:04:59Z

Here is the raw configure line. It is passing clang++ --cuda-gpu-arch=sm_80 as a single string right ?'NVCC="clang++' '--cuda-gpu-arch=sm_80"'

./configure --prefix= --enable-fast=O2 --with-pm=no --with-pmi=pmi2 --with-pmi2=/opt/cray/pe/pmi/default --with-xpmem=/opt/cray/xpmem/default --with-wrapper-dl-type=rpath --enable-threads=multiple --enable-shared=yes --enable-static=no --with-namepublisher=file --with-libfabric=/opt/cray/libfabric/1.15.2.0 --with-libfabric-include=/opt/cray/libfabric/1.15.2.0/include --with-libfabric-lib=/opt/cray/libfabric/1.15.2.0/lib64 --with-device=ch4:ofi --with-ch4-shmmods=posix,xpmem --enable-thread-cs=per-vci --with-cuda=/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2 CPPFLAGS=-I/opt/cray/pe/pmi/default/include CC=clang CFLAGS= 'NVCC="clang++' '--cuda-gpu-arch=sm_80"' NVCC_FLAGS=-allow-unsupported-compiler CXX=clang++ FC= FCFLAGS= F77= FFLAGS= 'LIBS=-lpmi -lpmi2 -Wl,--as-needed,-lcudart,--no-as-needed -lcuda' 'LDFLAGS=-L/opt/cray/pe/pmi/default/lib -L/opt/cray/libfabric/1.15.2.0/lib64 -L/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/lib64' MPICHLIB_CFLAGS=-fPIC MPICHLIB_CXXFLAGS=-fPIC MPICHLIB_FFLAGS=-fPIC MPICHLIB_FCFLAGS=-fPIC
configure: error: unrecognized option: `--cuda-gpu-arch=sm_80"'
Try `./configure --help' for more information

raffenet · 2024-04-02T19:50:07Z

What are these single quotes in the configure line? Can you remove them?

'NVCC="clang++' '--cuda-gpu-arch=sm_80"'

rgayatri23 · 2024-04-02T20:28:40Z

Ok thanks. It's weird, I was doing the configuration through a bash script and it did not do what I thought it did. Thanks for pointing it out.
I tested this manually and this time the options were accepted and the configuration went ahead but it failed at the following error:

In file included from /usr/lib64/gcc/x86_64-suse-linux/12/../../../../include/c++/12/cmath:47:
/usr/lib64/gcc/x86_64-suse-linux/12/../../../../include/c++/12/bits/std_abs.h:103:7: error: __float128 is not supported on this target
  103 |   abs(__float128 __x)
      |       ^
/usr/lib64/gcc/x86_64-suse-linux/12/../../../../include/c++/12/bits/std_abs.h:102:3: error: __float128 is not supported on this target
  102 |   __float128
      |   ^
/usr/lib64/gcc/x86_64-suse-linux/12/../../../../include/c++/12/bits/std_abs.h:103:18: note: '__x' defined here
  103 |   abs(__float128 __x)

raffenet · 2024-04-02T21:37:17Z

This is what I saw on my system as well. This issue suggested adding -D__STRICT_ANSI__ to NVCC/NVCC_FLAGS, which worked for me.

raffenet · 2024-04-02T21:45:48Z

This is what I saw on my system as well. This issue suggested adding -D__STRICT_ANSI__ to NVCC/NVCC_FLAGS, which worked for me.

Even though this gets past configure, it's likely going to fail at the make stage because the build system assumes nvcc format for the device code generation options. Supporting clang++ is going to take some additional changes.

rgayatri23 · 2024-04-03T16:41:04Z

Is there any plan to support clang as the cuda compiler?
Also I noticed that even nvcc as the cuda compiler fails with similar errors. Do I need some additional options to make it work?

raffenet · 2024-04-03T17:25:54Z

Is there any plan to support clang as the cuda compiler?

No timeline to supporting this configuration at the moment.

Also I noticed that even nvcc as the cuda compiler fails with similar errors. Do I need some additional options to make it work?

On the Polaris system here at ANL, I am able to build using nvcc + the system gcc (7.5.0) and also the NVIDIA HPC compilers (CC=nvc CXX=nvc++ FC=nvfortran) with no additional options.

rgayatri23 · 2024-04-03T18:38:55Z

No timeline to supporting this configuration at the moment.

Ok. But thanks, I see a few feature requests and patches based on this issue :-)

On the Polaris system here at ANL, I am able to build using nvcc + the system gcc (7.5.0) and also the NVIDIA HPC compilers (CC=nvc CXX=nvc++ FC=nvfortran) with no additional options.

I was actually thinking in terms of CC=clang CXX=clang++ NVCC=nvcc(maybe not needed) and disable fortran

raffenet · 2024-04-03T21:00:48Z

The reason it doesn't work has to do with the fact that we override the host compiler for nvcc during configuration. Removing the override allows me to progress through the yaksa build, so we need to understand why it is there in the first place...stay tuned for now.

raffenet · 2024-04-03T21:29:51Z

You should be able to workaround the host compiler issue by doing this in configure.

--with-cuda=/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2 NVCC=/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/bin/nvcc

The configure script will see that you've supplied an NVCC and skip the host compiler override. I have tested it successfully on Polaris.

rgayatri23 · 2024-04-04T20:00:08Z

You should be able to workaround the host compiler issue by doing this in configure.
--with-cuda=/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2 NVCC=/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/bin/nvcc
The configure script will see that you've supplied an NVCC and skip the host compiler override. I have tested it successfully on Polaris.

Thanks this worked. I think this is what I will do for now.

rgayatri23 · 2024-04-04T20:02:15Z

You can close this issue for now. But it would be great if I could be notified whenever its possible to use clang++ as the CUDA compiler too. Is there an issue which I can track for that update?

raffenet · 2024-05-03T18:14:40Z

@rgayatri23 see pmodels/yaksa#251 for tracking supporting clang++ as the CUDA compiler.

rgayatri23 closed this as completed Apr 1, 2024

rgayatri23 reopened this Apr 1, 2024

raffenet mentioned this issue Apr 2, 2024

backend/cuda: Use NVCC_FLAGS with user-provided NVCC pmodels/yaksa#250

Open

5 tasks

raffenet mentioned this issue Apr 3, 2024

feature: Support building CUDA backend kernels with clang++ pmodels/yaksa#251

Open

raffenet closed this as completed May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mpich not configuring with clang compilers on Perlmutter #6954

mpich not configuring with clang compilers on Perlmutter #6954

rgayatri23 commented Mar 29, 2024

raffenet commented Mar 29, 2024

rgayatri23 commented Mar 29, 2024

hzhou commented Mar 30, 2024

rgayatri23 commented Apr 1, 2024

rgayatri23 commented Apr 1, 2024

raffenet commented Apr 1, 2024

rgayatri23 commented Apr 1, 2024

hzhou commented Apr 1, 2024

raffenet commented Apr 1, 2024

rgayatri23 commented Apr 2, 2024

raffenet commented Apr 2, 2024

rgayatri23 commented Apr 2, 2024 •

edited

raffenet commented Apr 2, 2024

rgayatri23 commented Apr 2, 2024

raffenet commented Apr 2, 2024

rgayatri23 commented Apr 2, 2024

raffenet commented Apr 2, 2024

rgayatri23 commented Apr 2, 2024 •

edited

raffenet commented Apr 2, 2024

rgayatri23 commented Apr 2, 2024

raffenet commented Apr 2, 2024

raffenet commented Apr 2, 2024

rgayatri23 commented Apr 3, 2024

raffenet commented Apr 3, 2024

rgayatri23 commented Apr 3, 2024

raffenet commented Apr 3, 2024

raffenet commented Apr 3, 2024

rgayatri23 commented Apr 4, 2024

rgayatri23 commented Apr 4, 2024

raffenet commented May 3, 2024

mpich not configuring with clang compilers on Perlmutter #6954

mpich not configuring with clang compilers on Perlmutter #6954

Comments

rgayatri23 commented Mar 29, 2024

raffenet commented Mar 29, 2024

rgayatri23 commented Mar 29, 2024

hzhou commented Mar 30, 2024

rgayatri23 commented Apr 1, 2024

rgayatri23 commented Apr 1, 2024

raffenet commented Apr 1, 2024

rgayatri23 commented Apr 1, 2024

hzhou commented Apr 1, 2024

raffenet commented Apr 1, 2024

rgayatri23 commented Apr 2, 2024

raffenet commented Apr 2, 2024

rgayatri23 commented Apr 2, 2024 • edited

raffenet commented Apr 2, 2024

rgayatri23 commented Apr 2, 2024

raffenet commented Apr 2, 2024

rgayatri23 commented Apr 2, 2024

raffenet commented Apr 2, 2024

rgayatri23 commented Apr 2, 2024 • edited

raffenet commented Apr 2, 2024

rgayatri23 commented Apr 2, 2024

raffenet commented Apr 2, 2024

raffenet commented Apr 2, 2024

rgayatri23 commented Apr 3, 2024

raffenet commented Apr 3, 2024

rgayatri23 commented Apr 3, 2024

raffenet commented Apr 3, 2024

raffenet commented Apr 3, 2024

rgayatri23 commented Apr 4, 2024

rgayatri23 commented Apr 4, 2024

raffenet commented May 3, 2024

rgayatri23 commented Apr 2, 2024 •

edited

rgayatri23 commented Apr 2, 2024 •

edited