Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors about compiling and running hypre gpu example of amrex tutorials #3854

Open
ruohai0925 opened this issue Mar 27, 2024 · 4 comments
Open

Comments

@ruohai0925
Copy link
Contributor

ruohai0925 commented Mar 27, 2024

Hi, I plan to run the hypre gpu example of amrex-tutorials/ExampleCodes/LinearSolvers/ABecLaplacian_C. I use the latest AMReX, amrex-tutorials, and hypre. Here are my detailed steps, followed by some questions.

  1. I build the hypre library without using the unified memory, MPI, OpenMP. I use CUDA and SINGLE precision.
cmake -DHYPRE_WITH_MPI=OFF -DHYPRE_WITH_OPENMP=OFF -DHYPRE_BUILD_EXAMPLES=OFF -DHYPRE_ENABLE_UNIFIED_MEMORY=OFF -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON -DCMAKE_CXX_STANDARD=17 -DHYPRE_ENABLE_SINGLE=ON -DHYPRE_WITH_CUDA=ON -DCUDA_HOME=”/usr/local/cuda” -DHYPRE_CUDA_SM=86 -DHYPRE_BUILD_TYPE=Release -DCMAKE_BUILD_TYPE=Release -DHYPRE_ENABLE_DEVICE_POOL=ON .. 

cmake --build . --config Release 

cmake --install . --prefix /home/zengx372/Codes/hypre/src/hypre_gpu_wounifiedmemory --config Release
  1. Then I do

export HYPRE_DIR=/home/zengx372/Codes/hypre/src/hypre_gpu_wounifiedmemory

  1. Next I directly go to the third step of this pape https://amrex-codes.github.io/amrex/tutorials_html/Hypre_Install.html#building-with-hypre-via-gnumake to compile this example using GNUMake by using the following script.
DEBUG = FALSE

USE_MPI  = FALSE
USE_OMP  = FALSE
USE_CUDA = TRUE
USE_PRECISION = FLOAT
USE_HYPRE = TRUE

USE_PETSC = FALSE

COMP = gnu

DIM = 3

AMREX_HOME ?= ../../../../amrex

include $(AMREX_HOME)/Tools/GNUMake/Make.defs

include ./Make.package

Pdirs   := Base Boundary LinearSolvers/MLMG

Ppack   += $(foreach dir, $(Pdirs), $(AMREX_HOME)/Src/$(dir)/Make.package)

include $(Ppack)

include $(AMREX_HOME)/Tools/GNUMake/Make.rules

The compiling error is:

nvcc -MMD -MP -ccbin=g++ -Xcompiler=' -Werror=return-type -g -O3 -std=c++17  -pthread  -std=c++17' --std=c++17 -Wno-deprecated-gpu-targets -m64 --generate-code arch=compute_80,code=sm_80 -maxrregcount=255 --expt-relaxed-constexpr --expt-extended-lambda --forward-unknown-to-host-compiler -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -Xcudafe --diag_suppress=implicit_return_from_non_void_function --Werror cross-execution-space-call -lineinfo --ptxas-options=-O3 --ptxas-options=-v --use_fast_math  --Werror ext-lambda-captures-this --display-error-number --diag-error 20092 -x cu -dc   -DAMREX_USE_CUDA -DAMREX_USE_GPU -DBL_COALESCE_FABS -DAMREX_GPU_MAX_THREADS=256 -DAMREX_USE_GPU_RDC -DBL_SPACEDIM=3 -DAMREX_SPACEDIM=3 -DBL_FORT_USE_UNDERSCORE -DAMREX_FORT_USE_UNDERSCORE -DBL_Linux -DAMREX_Linux -DNDEBUG -DAMREX_USE_HYPRE -Itmp_build_dir/s/3d.gnu.CUDA.EXE -I. -I../../../../amrex/Src/Extern/HYPRE -I../../../../amrex/Src/Base -I../../../../amrex/Src/Base/Parser -I../../../../amrex/Src/Boundary -I../../../../amrex/Src/LinearSolvers/MLMG -isystem /home/zengx372/Codes/hypre/src/hypre_gpu_wounifiedmemory/include -isystem /usr/local/cuda/include  -c MyTest.cpp -o tmp_build_dir/o/3d.gnu.CUDA.EXE/MyTest.o
../../../../amrex/Src/Extern/HYPRE/AMReX_HypreIJIface.cpp(153): error: argument of type "amrex::HypreIJIface::HypreRealType *" is incompatible with parameter of type "amrex::Real *"
      m_solverFinalResidualNormPtr(m_solver, &m_final_res_norm);
                                             ^

../../../../amrex/Src/Extern/HYPRE/AMReX_HypreIJIface.cpp(392): error: a value of type "HYPRE_Int (*)(HYPRE_Solver, HYPRE_Real)" cannot be assigned to an entity of type "amrex::HypreIJIface::HypreIntType (*)(HYPRE_Solver, amrex::Real)"
      m_solverSetTolPtr = &HYPRE_BoomerAMGSetTol;
                        ^

../../../../amrex/Src/Extern/HYPRE/AMReX_HypreIJIface.cpp(396): error: a value of type "HYPRE_Int (*)(HYPRE_Solver, HYPRE_Real *)" cannot be assigned to an entity of type "amrex::HypreIJIface::HypreIntType (*)(HYPRE_Solver, amrex::Real *)"
      m_solverFinalResidualNormPtr = &HYPRE_BoomerAMGGetFinalRelativeResidualNorm;

It seems that there are some mismatches of the datatype in the above output.

  1. Then I also try to use CMake to compile this example by following https://amrex-codes.github.io/amrex/tutorials_html/Hypre_Install.html#building-with-hypre-via-cmake. The commands are:
cmake .. -DAMReX_HYPRE=ON -DHYPRE_LIBRARIES=${HYPRE_DIR}/lib64/libHYPRE.a -DHYPRE_INCLUDE_DIRS=${HYPRE_DIR}/include -DAMReX_LINEAR_SOLVERS=ON -DAMReX_FORTRAN=ON -DAMReX_PRECISION=SINGLE -DAMReX_MPI=NO -DAMReX_OMP=OFF -DAMReX_CUDA_DEBUG=NO -DAMReX_GPU_BACKEND=CUDA -DAMReX_CUDA_ARCH=86 -DAMReX_AMRLEVEL=NO

cmake --build . -j8

The build works as
image

But when I run the example with:

./ABecLaplacian_C inputs.hypre

It shows the error:

Initializing AMReX (24.03-37-g2a9e992fce9b)...
Initializing CUDA...
CUDA initialized with 1 device.
AMReX (24.03-37-g2a9e992fce9b) initialized
MLMG: # of AMR levels: 1
      # of MG levels on the coarsest AMR level: 1
MLMG: Initial rhs               = 2347.400635
MLMG: Initial residual (resid0) = 2347.400635
Erroneous arithmetic operation
See Backtrace.0 file for details

And here is the Backtrace file:

0: ./ABecLaplacian_C() [0x4e50e6]
   amrex::BLBackTrace::print_backtrace_info(_IO_FILE*) at ??:?

1: ./ABecLaplacian_C() [0x4e77c2]
   amrex::BLBackTrace::handler(int) at ??:?

2: /lib64/libc.so.6(+0x4ad50) [0x7fe7a0419d50]

3: ./ABecLaplacian_C() [0x94a3bd]
   hypre_BoomerAMGSolve at ??:?

4: ./ABecLaplacian_C() [0x8de441]
   amrex::HypreIJIface::solve(float, float, int) at ??:?

5: ./ABecLaplacian_C() [0x8bfd1e]
   amrex::HypreABecLap3::solve(amrex::MultiFab&, amrex::MultiFab const&, float, float, int, amrex::BndryDataT<amrex::MultiFab> const&, int) at ??:?

6: ./ABecLaplacian_C() [0x753df6]
   void amrex::MLMGT<amrex::MultiFab>::bottomSolveWithHypre<amrex::MultiFab, 0>(amrex::MultiFab&, amrex::MultiFab const&) at ??:?

7: ./ABecLaplacian_C() [0x75467e]
   amrex::MLMGT<amrex::MultiFab>::actualBottomSolve() at ??:?

8: ./ABecLaplacian_C() [0x754c50]
   amrex::MLMGT<amrex::MultiFab>::mgVcycle(int, int) at ??:?

9: ./ABecLaplacian_C() [0x75660c]
   amrex::MLMGT<amrex::MultiFab>::oneIter(int) at ??:?

10: ./ABecLaplacian_C() [0x45a636]
   float amrex::MLMGT<amrex::MultiFab>::solve<amrex::MultiFab>(amrex::Vector<amrex::MultiFab*, std::allocator<amrex::MultiFab*> > const&, amrex::Vector<amrex::MultiFab const*, std::allocator<amrex::MultiFab const*> > const&, float, float, char const*) at ??:?

11: ./ABecLaplacian_C() [0x45b724]
   float amrex::MLMGT<amrex::MultiFab>::solve<amrex::MultiFab>(std::initializer_list<amrex::MultiFab*>, std::initializer_list<amrex::MultiFab const*>, float, float, char const*) at ??:?

12: ./ABecLaplacian_C() [0x44ce3e]
   MyTest::solveABecLaplacian() at ??:?

13: ./ABecLaplacian_C() [0x426f59]
   main at ??:?

14: /lib64/libc.so.6(__libc_start_main+0xef) [0x7fe7a040429d]

15: ./ABecLaplacian_C() [0x44868a]
   _start
../sysdeps/x86_64/start.S:122

Can someone help to give me some suggestions? I will be happy to provide info if needed.

@ruohai0925
Copy link
Contributor Author

The above errors still appear even if I compile and build the hypre with -DHYPRE_ENABLE_UNIFIED_MEMORY=ON and add the amrex.the_arena_is_managed = 1 in the inputs file.

@WeiqunZhang
Copy link
Member

Does it work if you use double precision?

@WeiqunZhang
Copy link
Member

I can reproduce the failure. It might be that the particular problem is simply too hard in single precision. The relative tolerance in the setup is hard wired to 1.e-10. I changed them all to 1.e-4. Then if I run with prob_type=1 (i.e., Poisson), it seems to work. If I run with prob_type=2 n_cell=32, it runs for a number of iterations and the residual blows up.

@WeiqunZhang
Copy link
Member

If you make the problem type 2 easier by make it more diagonal dominant,

index 97c928a88d..f3d7bb697d 100644
--- a/Tests/LinearSolvers/ABecLaplacian_C/MyTest.H
+++ b/Tests/LinearSolvers/ABecLaplacian_C/MyTest.H
@@ -75,7 +75,7 @@ private:
     amrex::Vector<amrex::MultiFab> acoef;
     amrex::Vector<amrex::MultiFab> bcoef;
 
-    amrex::Real ascalar = 1.e-3;
+    amrex::Real ascalar = 1.e-1;
     amrex::Real bscalar = 1.0;
 };

it also works. So I don't think there is a real issue here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants