You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For release builds I remove the --enable-debug option from the autoconf command and change Debug to Release in the CMake commands.
Building examples with make
make use_cuda=1
Actual behavior
mpirun -np 2 ./ex1
Debug build output
[LOG_CAT_ML] You must specify a valid HCA device by setting:
-x HCOLL_MAIN_IB=<dev_name:port> or -x UCX_NET_DEVICES=<dev_name:port>.
If no device was specified for HCOLL (or the calling library), automatic device detection will be run.
In case of unfounded HCA device please contact your system administrator.
[LOG_CAT_ML] You must specify a valid HCA device by setting:
-x HCOLL_MAIN_IB=<dev_name:port> or -x UCX_NET_DEVICES=<dev_name:port>.
If no device was specified for HCOLL (or the calling library), automatic device detection will be run.
In case of unfounded HCA device please contact your system administrator.
[mtndew:298978] Error: ../../../../../ompi/mca/coll/hcoll/coll_hcoll_module.c:310 - mca_coll_hcoll_comm_query() Hcol library init failed
[mtndew:298979] Error: ../../../../../ompi/mca/coll/hcoll/coll_hcoll_module.c:310 - mca_coll_hcoll_comm_query() Hcol library init failed
<C*b,b>: 1.800000e+01
Iters ||r||_C conv.rate ||r||_C/||b||_C
----- ------------ --------- ------------
1 2.509980e+00 0.591608 5.916080e-01
2 9.888265e-01 0.393958 2.330686e-01
3 4.572262e-01 0.462393 1.077693e-01
4 1.706474e-01 0.373223 4.022197e-02
5 7.473022e-02 0.437922 1.761408e-02
6 3.402624e-02 0.455321 8.020061e-03
7 1.214929e-02 0.357057 2.863616e-03
8 3.533113e-03 0.290808 8.327628e-04
9 1.343893e-03 0.380371 3.167586e-04
10 2.968745e-04 0.220906 6.997400e-05
11 5.329671e-05 0.179526 1.256215e-05
12 7.308483e-06 0.137128 1.722626e-06
13 7.411552e-07 0.101410 1.746920e-07
I don't know why I get the warnings, however the results are consistent with what is discussed in issue #845.
Release build output
[LOG_CAT_ML] You must specify a valid HCA device by setting:
-x HCOLL_MAIN_IB=<dev_name:port> or -x UCX_NET_DEVICES=<dev_name:port>.
If no device was specified for HCOLL (or the calling library), automatic device detection will be run.
In case of unfounded HCA device please contact your system administrator.
[LOG_CAT_ML] You must specify a valid HCA device by setting:
-x HCOLL_MAIN_IB=<dev_name:port> or -x UCX_NET_DEVICES=<dev_name:port>.
If no device was specified for HCOLL (or the calling library), automatic device detection will be run.
In case of unfounded HCA device please contact your system administrator.
[mtndew:298978] Error: ../../../../../ompi/mca/coll/hcoll/coll_hcoll_module.c:310 - mca_coll_hcoll_comm_query() Hcol library init failed
[mtndew:298979] Error: ../../../../../ompi/mca/coll/hcoll/coll_hcoll_module.c:310 - mca_coll_hcoll_comm_query() Hcol library init failed
<C*b,b>: 0.000000e+00
Expected behavior
Both Debug and Release builds should yield the same results.
As it seems, the release build does not do anything.
Thank you for reporting this issue. The reason why the debug mode works but the release mode doesn't is that we reply on unified memory to transfer data to GPUs in the examples. The debug mode implicitly forces device synchornization. In principle, we should use device memory where this wouldn't be an issue but the goal of the examples is to show basics of using hypre, so we keep the GPU code as simple as possible. For your own code, you can still follow the example code whereas the memory should be on device and populated on device as well, or add adequate explicit device synchronization.
Issue description
Important note
Steps to reproduce the behavior
Setting CUDA_HOME to your CUDA installation home directory
I have tested this with both cuda-toolkit and nvhpc bundled CUDA versions and with both CUDA 11.8 and 12.3.
I am on Pop!OS 22.04. I have the latest nvhpc-cuda-multi package from the NVHPC repos, as well as the two latest CUDA versions from the CUDA repos.
I use:
or
Building HYPRE either via autoconf or via CMake
For autoconf Debug build:
For CMake:
For release builds I remove the --enable-debug option from the autoconf command and change Debug to Release in the CMake commands.
Building examples with make
Actual behavior
Debug build output
I don't know why I get the warnings, however the results are consistent with what is discussed in issue #845.
Release build output
Expected behavior
Both Debug and Release builds should yield the same results.
As it seems, the release build does not do anything.
EDIT #1: Fixed hyperlinks
The text was updated successfully, but these errors were encountered: