Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some deterministic tests fail #4921

Open
correaa opened this issue Feb 12, 2024 · 0 comments
Open

some deterministic tests fail #4921

correaa opened this issue Feb 12, 2024 · 0 comments

Comments

@correaa
Copy link
Contributor

correaa commented Feb 12, 2024

Describe the bug

Deterministic test fail in this CI machine with version 0b1184fe of QMCPACK

https://gitlab.com/correaa/boost-mpi3/-/jobs/6147742115

�[0KRunning with gitlab-runner 15.11.0 (436955cb)�[0;m
�[0K  on jgpu JJ6dZUaTJ, system ID: s_b6bbb2ad428c�[0;m
section_start:1707525942:prepare_executor
�[0K�[0K�[36;1mPreparing the "docker" executor�[0;m�[0;m
�[0KUsing Docker executor with image debian:testing ...�[0;m
�[0KPulling docker image debian:testing ...�[0;m
�[0KUsing docker image sha256:13851b673eeab4560ac786972478582543778c409e61150523fef216d5e5561f for debian:testing with digest debian@sha256:34cc898f64db9ba60a83da2c5fa6ffcbbbf6ad296505dccbb733dd3640de5e23 ...�[0;m
section_end:1707525947:prepare_executor
�[0Ksection_start:1707525947:prepare_script
�[0K�[0K�[36;1mPreparing environment�[0;m�[0;m
Running on runner-jj6dzuatj-project-3503809-concurrent-3 via jgpu...
section_end:1707525949:prepare_script
�[0Ksection_start:1707525949:get_sources
�[0K�[0K�[36;1mGetting source from Git repository�[0;m�[0;m
�[32;1mFetching changes...�[0;m
Initialized empty Git repository in /builds/correaa/boost-mpi3/.git/
�[32;1mCreated fresh repository.�[0;m
�[32;1mChecking out 0b1184fe as detached HEAD (ref is master)...�[0;m

�[32;1mUpdating/initializing submodules recursively...�[0;m
section_end:1707525952:get_sources
�[0Ksection_start:1707525952:step_script
�[0K�[0K�[36;1mExecuting "step_script" stage of the job script�[0;m�[0;m
�[0KUsing docker image sha256:13851b673eeab4560ac786972478582543778c409e61150523fef216d5e5561f for debian:testing with digest debian@sha256:34cc898f64db9ba60a83da2c5fa6ffcbbbf6ad296505dccbb733dd3640de5e23 ...�[0;m

This is a typical failure:

  85/1151 Test   #29: deterministic-unit_test_wavefunction_trialwf .............................................................***Failed   25.23 sec
QMCPACK printout is suppressed. Use --turn-on-printout to see all the printout.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0;37mtest_wavefunction_trialwf is a Catch v2.13.9 host application.
Run with -? for options

�[0m-------------------------------------------------------------------------------
�[0mTrialWaveFunction_diamondC_2x1x1
�[0m�[0m  DiracDeterminantBatched<MatrixUpdateOMPTarget>
�[0m-------------------------------------------------------------------------------
�[0;37m/builds/correaa/boost-mpi3/qmcpack/src/QMCWaveFunctions/tests/test_TrialWaveFunction_diamondC_2x1x1.cpp:605
...............................................................................

�[0m�[0;37m/builds/correaa/boost-mpi3/qmcpack/src/QMCWaveFunctions/tests/test_TrialWaveFunction_diamondC_2x1x1.cpp:230: �[0m�[1;31mFAILED:
�[0m�[0;36m  CHECK( r_all_val == Approx(0.1248738460469678) )
�[0mwith expansion:
�[1;33m  0.12486f == Approx( 0.124873846 )
�[0m
�[0;37m/builds/correaa/boost-mpi3/qmcpack/src/QMCWaveFunctions/tests/test_TrialWaveFunction_diamondC_2x1x1.cpp:231: �[0m�[1;31mFAILED:
�[0m�[0;36m  CHECK( r_fermionic_val == ValueApprox(0.1362181543982075) )
�[0mwith expansion:
�[1;33m  0.1362f == Approx( 0.1362181544 )
�[0m
�[0;37m/builds/correaa/boost-mpi3/qmcpack/src/QMCWaveFunctions/tests/test_TrialWaveFunction_diamondC_2x1x1.cpp:242: �[0m�[1;31mFAILED:
�[0m�[0;36m  CHECK( psi.getLogPsi() == Approx(-8.013162503965223) )
�[0mwith expansion:
�[1;33m  -8.0133f == Approx( -8.013162504 )
�[0m

To Reproduce

The full instructions are contained here https://gitlab.com/correaa/boost-mpi3/-/jobs/6147742115

$ apt-get -qq update && apt-get -qq install --no-install-recommends -y libblas-dev liblapack-dev libfftw3-dev libboost-serialization-dev libopenmpi-dev gfortran g++ cmake make git ca-certificates numdiff python3 python3-numpy python3-h5py python3-mpi4py python3-scipy libxml2-dev libhdf5-dev valgrind

$ cd qmcpack
$ cd build
$ cmake -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx -DBUILD_AFQMC=1 -DBUILD_PPCONVERT=1 -DQMC_MIXED_PRECISION=1 -DCMAKE_BUILD_TYPE=Debug -DMPIEXEC_PREFLAGS="--allow-run-as-root;--bind-to;none" ..
$ make --jobs=2
$ export VALGRIND_EXE="valgrind --leak-check=full --track-origins=yes --show-leak-kinds=all --suppressions=.valgrind_suppressions --gen-suppressions=all --error-exitcode=1 "�[0;m
$ OMPI_ALLOW_RUN_AS_ROOT=1 OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1 ctest -L deterministic -j 2 --output-on-failure

Expected behavior

ctests should succeed

System:

  • system name: CI machine
  • modules loaded [e.g. output of module list]
  • other systems where this is reproducible [e.g. "my laptop", "none"]

Additional context
Add any other context about the problem here.

This is the cmake command and output:

$ cmake -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx -DBUILD_AFQMC=1 -DBUILD_PPCONVERT=1 -DQMC_MIXED_PRECISION=1 -DCMAKE_BUILD_TYPE=Debug -DMPIEXEC_PREFLAGS="--allow-run-as-root;--bind-to;none" ..�[0;m
-- The C compiler identification is GNU 13.2.0
-- The CXX compiler identification is GNU 13.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/mpicc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/mpicxx - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- ENABLE_CUDA disabled
-- GPU device architectures: 
-- Defining the float point precision
   Base precision = float
   Full precision = double
-- CMAKE_BUILD_TYPE is DEBUG
-- LMY engine is not compatible with CPU mixed precision build! Disabling LMY engine
-- Enable sanitizer ENABLE_SANITIZER=none
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5")  
-- Trying to figure out compiler options ....
-- C++ Compiler is identified by QMCPACK as : GNU
-- Looking for C++ include cstdio
-- Looking for C++ include cstdio - found
-- libstdc++/C++ compiler version compatibility check pass
-- C++17 standard library supported
-- QMC_SIMD_ALIGNMENT is set to 64
-- OpenMP taskloop functionality check pass
-- ENABLE_OMP_TASKLOOP is set to ON
-- Found MPI_CXX: /usr/bin/mpicxx (found version "3.1") 
-- Found MPI: TRUE (found version "3.1") found components: CXX 
-- MPI runner MPIEXEC_EXECUTABLE : /usr/bin/mpiexec
-- MPIEXEC_NUMPROC_FLAG : -n
-- MPIEXEC_PREFLAGS : --allow-run-as-root;--bind-to;none
-- Tests run as : /usr/bin/mpiexec -n NUM_PROCS --allow-run-as-root --bind-to none EXECUTABLE
-- MPI is enabled
-- Looking for posix_memalign
-- Looking for posix_memalign - found
-- Trying to find LAPACK from Intel MKL
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Could NOT find BLAS (missing: BLAS_LIBRARIES) 
-- Could NOT find LAPACK (missing: LAPACK_LIBRARIES) 
    Reason given by package: LAPACK could not be found because dependency BLAS could not be found.

-- Intel MKL library files not found via FindLAPACK.
-- Trying to find alternative LAPACK libraries
-- Looking for sgemm_
-- Looking for sgemm_ - not found
-- Looking for sgemm_
-- Looking for sgemm_ - found
-- Found BLAS: /usr/lib/x86_64-linux-gnu/libblas.so  
-- Looking for cheev_
-- Looking for cheev_ - not found
-- Looking for cheev_
-- Looking for cheev_ - found
-- Found LAPACK: /usr/lib/x86_64-linux-gnu/liblapack.so;/usr/lib/x86_64-linux-gnu/libblas.so  
-- LAPACK linker flags: 
-- LAPACK libraries: /usr/lib/x86_64-linux-gnu/liblapack.so;/usr/lib/x86_64-linux-gnu/libblas.so
CMake Warning at CMakeLists.txt:558 (message):
  AFQMC - MKL not found, using simple sparse matrix routines.  Link with MKL
  sparse libraries for better performance.


-- Selected vendor math library GENERIC
-- SINCOS_INCLUDE : cmath
-- Performing Test HAVE_SINCOS
-- Performing Test HAVE_SINCOS - Success
-- Found FFTW: /usr/lib/x86_64-linux-gnu/libfftw3.so  
-- FFTW_INCLUDE_DIR=/usr/include
-- FFTW_LIBRARIES=/usr/lib/x86_64-linux-gnu/libfftw3.so
-- Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so (found version "1.3")  
-- Found LibXml2: /usr/lib/x86_64-linux-gnu/libxml2.so (found version "2.9.14") 
-- Linking dynamic HDF5 library
-- Found HDF5: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so;/usr/lib/x86_64-linux-gnu/libcrypto.so;/usr/lib/x86_64-linux-gnu/libcurl.so;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/libsz.so;/usr/lib/x86_64-linux-gnu/libz.so;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so (found version "1.10.10") found components: C 
-- Serial HDF5 library found
-- Using HDF5 non-scalable serial I/O code paths
CMake Warning at CMakeLists.txt:685 (message):
  MPI builds may have performance loss by not using parallel HDF5! (Safe to
  ignore for workstation builds).


-- Found Boost: /usr/include (found suitable version "1.83.0", minimum required is "1.61.0")  
-- Setting Boost_INCLUDE_DIRS=/usr/include
-- CUDA NVTX APIs disabled
-- VTune ittnotify APIs disabled
Project C_FLAGS:  -fopenmp -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -Wno-deprecated -Werror=vla -ffast-math -march=native -g -fno-omit-frame-pointer
Project CXX_FLAGS:  -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -Wno-deprecated -Wvla -Wcomment -Wmisleading-indentation -Wmaybe-uninitialized -Wuninitialized -Wreorder -Wno-unknown-pragmas -Wno-sign-compare -Wsuggest-override -ffast-math -march=native -g -fno-omit-frame-pointer
Project INCLUDE_DIRECTORIES: /builds/correaa/boost-mpi3/qmcpack/build/src;/builds/correaa/boost-mpi3/qmcpack/src
Project EXE_LINKER_FLAGS:  
Project SHARED_LINKER_FLAGS:  
-- Found Git: /usr/bin/git (found version "2.43.0") 
-- QMC_SYMLINK_TEST_FILES = ON.  Using symbolic links for large test files may cause test failures if the build is installed on a separate filesystem from the source.
-- Found Python3: /usr/bin/python3 (found version "3.11.7") found components: Interpreter 
-- Unable to import PySCF python module. PySCF tests will not be run.
-- Unable to import GPAW python module. GPAW converter tests will not be run.
-- Did not find a patched Quantum ESPRESSO (QE) distribution with pw2qmcpack.x. QE tests will not be run.
-- Did not find RMG (rmg-cpu). RMG tests will not be run.
-- Ready to parse QMCPACK source tree
sed supports -E
Git branch: develop
Git commit hash: 65dd666f3dbd9d7808cba5458ade5a4ccd5f294c
Building AFQMC performance executable 
-- Performing Test ISNAN_WORKS
-- Performing Test ISNAN_WORKS - Success
-- ppconvert enabled.
-- Adding integration tests for QMCPACK
Use --log-level=VERBOSE CMake option for details of which tests will be enabled.
Skipping converter tests with HDF output because executable h5diff was not found
Adding estimator tests for QMCPACK
Missing python module pandas, not adding test estimator-latdev
Missing python module pandas, not adding test estimator-sofk
Adding I/O tests for QMCPACK
QMC_DATA is not set. Performance tests will be skipped.
Adding system tests for QMCPACK
Missing python module pyscf, not adding test afqmc_workflow
Adding example tests for QMCPACK
Adding Nexus tests
-- Configuring done (20.3s)
-- Generating done (0.3s)
CMake Warning:
  Manually-specified variables were not used by the project:

    BUILD_PPCONVERT


-- Build files have been written to: /builds/correaa/boost-mpi3/qmcpack/build
$ make --jobs=2 || make VERBOSE=1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant