Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regtest failures on ppc64le #3077

Open
opoplawski opened this issue Oct 29, 2023 · 5 comments
Open

Regtest failures on ppc64le #3077

opoplawski opened this issue Oct 29, 2023 · 5 comments

Comments

@opoplawski
Copy link
Contributor

We're looking at updating cp2k in Fedora Rawhide to the latest git version and using cmake (see https://src.fedoraproject.org/rpms/cp2k/pull-request/5). I've re-enabled the regtests to get a baseline on their status. I'm seeing the following on ppc64le:

+ tests/do_regtest.py --workbasedir /builddir/build/BUILD local ssmp
*************************** Testing started ****************************
----------------------------- Settings ---------------------------------
MPI ranks:      1
OpenMP threads: 2
GPU devices:    0
Workers:        4
Timeout [s]:    400
Work base dir:  /builddir/build/BUILD/TEST-local-ssmp-2023-10-29_04-44-23
MPI exec:       ['mpiexec']
Smoke test:     False
Valgrind:       False
Keepalive:      False
Flag slow:      False
Debug:          False
ARCH:           local
VERSION:        ssmp
Flags:          ndebug,omp,fftw3
------------------------------------------------------------------------
...
>>> /builddir/build/BUILD/TEST-local-ssmp-2023-10-29_04-44-23/Fist/regtest-1-4
    multipole_dip_qu.dbg_f.inp                                                           0.04511253985           OK (   0.61 sec)
    multipole_dip_qu.dbg_f_real.inp                                                      0.04511287382           OK (   3.06 sec)
    multipole_dip_qu.dbg_f_rec.inp                                                        -2.068221929           OK (   1.16 sec)
    multipole_dipole.dbg_f.inp                                                           -0.3863523169           OK (   0.97 sec)
    multipole_dipole.dbg_f_real.inp                                                      -0.3863519829           OK (   1.12 sec)
    multipole_dipole.dbg_f_rec.inp                                                       -0.3875468693           OK (   1.48 sec)
    multipole_quadrupole.dbg_f.inp                                                       -0.9649256212           OK (   1.15 sec)
    multipole_quadrupole.dbg_f_real.inp                                                  -0.9649256212           OK (   3.31 sec)
    multipole_quadrupole.dbg_f_rec.inp                                                    -1.026475286           OK (   1.86 sec)
    deca_ala_avg2.inp                                                                     -1.046403931           OK (   0.55 sec)
    deca_ala_noavg.inp                                                                    -1.036150905           OK (   0.36 sec)
    multipole_ch_dip.dbg_st.inp                                                             1.3965e-08           OK (   1.03 sec)
    multipole_ch_dip_qu.dbg_st.inp                                                          3.6258e-08           OK (   0.78 sec)
    multipole_ch_qu.dbg_st.inp                                                              5.1379e-08           OK (   0.70 sec)
    multipole_charge.dbg_st.inp                                                              1.738e-09           OK (   0.32 sec)
    multipole_dip_qu.dbg_st.inp                                                             3.2584e-08           OK (   0.68 sec)
    multipole_dipole.dbg_st.inp                                                             1.0661e-08           OK (   1.00 sec)
    multipole_quadrupole.dbg_st.inp                                                         5.7913e-08           OK (   1.31 sec)
    water_charge_no_array.inp                                                         -0.0003964367484           OK (   1.01 sec)
    water_charge_array.inp                                                            -0.0003964367484           OK (   1.10 sec)
    water_charge_no_array_ewald.inp                                                   -0.0003964974328           OK (   0.85 sec)
    water_charge_array_ewald.inp                                                      -0.0003964974328           OK (   0.85 sec)
    water_charge_no_array_pme.inp                                                     -0.0003964366851           OK (   1.17 sec)
    water_charge_array_pme.inp                                                        -0.0003964366851           OK (   1.29 sec)
    argon_atprop.inp                                                                     -0.2112299639           OK (   1.40 sec)
    water_atprop_spme.inp                                                               0.003747604587           OK (   1.34 sec)
    water_atprop_pme.inp                                                                0.003755557608           OK (   2.65 sec)
    water_atprop_ewald.inp                                                              0.003756647045 WRONG RESULT (   0.86 sec)
<<< /builddir/build/BUILD/TEST-local-ssmp-2023-10-29_04-44-23/Fist/regtest-1-4 (69 of 210) done in 33.94 sec
...
>>> /builddir/build/BUILD/TEST-local-ssmp-2023-10-29_04-44-23/QS/regtest-dm-ls-scf-2
    H2O-32-dftb-ls-4.inp                                                                  -65.15650916           OK (   0.62 sec)
    ace_ala_nme_pm6_01.inp                                                                -67.52715999           OK (   3.74 sec)
    ace_ala_nme_pm6_02.inp                                                                -67.52715999           OK (   3.14 sec)
    ace_ala_nme_pm6_03.inp                                                                -67.52715999           OK (   4.04 sec)
    ace_ala_nme_pm6_04.inp                                                                -67.52716002           OK (   0.94 sec)
    ace_ala_nme_pm6_05.inp                                                                -67.52716002           OK (   0.98 sec)
    ace_ala_nme_pm6_06.inp                                                                -67.52716002           OK (   0.96 sec)
    ace_ala_nme_pm6_07.inp                                                                -67.52715027 WRONG RESULT (   3.83 sec)
    ace_ala_nme_pm6_08.inp                                                                -67.52714411           OK (   2.76 sec)
    ace_ala_nme_pm6_09.inp                                                                -67.52714411           OK (   2.83 sec)
    ace_ala_nme_pm6_10.inp                                                                -67.52716095           OK (   0.83 sec)
    ace_ala_nme_pm6_11.inp                                                                -67.52716052           OK (   0.88 sec)
    ace_ala_nme_pm6_12.inp                                                                -67.52716052           OK (   0.88 sec)
    H2O-32-dftb-trs4.inp                                                                  -32.57418613           OK (   0.35 sec)
    H2O-32-dftb-ls-5.inp                                                                  -65.15650892           OK (   0.49 sec)
    H2O-32-dftb-ls-6.inp                                                                  -65.15650916           OK (   0.60 sec)
<<< /builddir/build/BUILD/TEST-local-ssmp-2023-10-29_04-44-23/QS/regtest-dm-ls-scf-2 (131 of 210) done in 27.87 sec
...
------------------------------- Errors ---------------------------------
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
/builddir/build/BUILD/TEST-local-ssmp-2023-10-29_04-44-23/Fist/regtest-1-4/water_atprop_ewald.inp.out
Difference too large: 2.66e-12 > 1e-14, ref_value: 0.375664704477E-02, value: 0.375664704476E-02.
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
/builddir/build/BUILD/TEST-local-ssmp-2023-10-29_04-44-23/QS/regtest-dm-ls-scf-2/ace_ala_nme_pm6_07.inp.out
Difference too large: 1.27e-07 > 1e-07, ref_value: -67.527158834036754, value: -67.527150274892620.
------------------------------- Timings --------------------------------
Plot: name="timings", title="Timing Distribution", ylabel="time [s]"
PlotPoint: name="100th_percentile", plot="timings", label="100th %ile", y=65.80, yerr=0.0
PlotPoint: name="99th_percentile", plot="timings", label="99th %ile", y=15.75, yerr=0.0
PlotPoint: name="98th_percentile", plot="timings", label="98th %ile", y=10.52, yerr=0.0
PlotPoint: name="95th_percentile", plot="timings", label="95th %ile", y=7.46, yerr=0.0
PlotPoint: name="90th_percentile", plot="timings", label="90th %ile", y=5.09, yerr=0.0
PlotPoint: name="80th_percentile", plot="timings", label="80th %ile", y=2.62, yerr=0.0
----------------------------- Slow Tests -------------------------------
Duration threshold (2x 95th %ile): 14.92 sec
Found 13 slow tests (20 suppressed):
    xTB/regtest-stda/water_xTB_2.inp                                                 (  15.05 sec)
    xTB/regtest-5/ice2.inp                                                           (  19.94 sec)
    xTB/regtest-5/Ru_geo.inp                                                         (  15.31 sec)
    QS/regtest-pao-2/H2O_pao_fock.inp                                                (  15.75 sec)
    QS/regtest-pao-2/H2O_pao_gth.inp                                                 (  20.11 sec)
    QS/regtest-pao-2/H2O_pao_exp_cluster_MD.inp                                      (  19.38 sec)
    SE/regtest-2-2/c2h5cl.inp                                                        (  20.01 sec)
    MC/regtest/hmc.inp                                                               (  16.14 sec)
    Fist/regtest-6/JAC_distr.inp                                                     (  22.94 sec)
    Fist/regtest-5/JAC.inp                                                           (  16.99 sec)
    Fist/regtest-5/JAC_us.inp                                                        (  16.24 sec)
    Fist/regtest-5/JAC_gen.inp                                                       (  18.57 sec)
    NNP/regtest-1/H2O-64_C-NNP_MD-NpT-numeric.inp                                    (  19.23 sec)
------------------------------- Summary --------------------------------
Number of FAILED  tests 0
Number of WRONG   tests 2
Number of CORRECT tests 2946
Total number of   tests 2948
Summary: correct: 2946 / 2948; wrong: 2; 23min
Status: FAILED
*************************** Testing ended ******************************
@mtaillefumier
Copy link
Contributor

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
/builddir/build/BUILD/TEST-local-ssmp-2023-10-29_04-44-23/Fist/regtest-1-4/water_atprop_ewald.inp.out
Difference too large: 2.66e-12 > 1e-14, ref_value: 0.375664704477E-02, value: 0.375664704476E-02.
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
/builddir/build/BUILD/TEST-local-ssmp-2023-10-29_04-44-23/QS/regtest-dm-ls-scf-2/ace_ala_nme_pm6_07.inp.out
Difference too large: 1.27e-07 > 1e-07, ref_value: -67.527158834036754, value: -67.527150274892620.

The error is slightly higher than expected but the results are still sound. What are the dependencies installed (especially openblas). Do you always get the same failing tests if you run them twice.

@mtaillefumier
Copy link
Contributor

BTW, if you encounter anything breaking in the cmake build system (it is fairly stable but we keep finding small issues) do not hesitate to open an issue.

@opoplawski
Copy link
Contributor Author

It seems pretty consistent so far, even with mpich:

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
/builddir/build/BUILD/TEST-local_mpich-psmp-2023-10-30_02-55-28/Fist/regtest-1-4/water_atprop_ewald.inp.out
Difference too large: 2.66e-12 > 1e-14, ref_value: 0.375664704477E-02, value: 0.375664704476E-02.
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
/builddir/build/BUILD/TEST-local_mpich-psmp-2023-10-30_02-55-28/QS/regtest-dm-ls-scf-2/ace_ala_nme_pm6_07.inp.out
Difference too large: 1.89e-07 > 1e-07, ref_value: -67.527158834036754, value: -67.527146089401640.

Deps:

DEBUG util.py:448:   dbcsr-devel                  ppc64le  2.6.0-1.fc40                build  129 k
DEBUG util.py:448:   dbcsr-mpich-devel            ppc64le  2.6.0-1.fc40                build  141 k
DEBUG util.py:448:   dbcsr-openmpi-devel          ppc64le  2.6.0-1.fc40                build  136 k
DEBUG util.py:448:   elpa-mpich-devel             ppc64le  2022.05.001-3.fc39          build   51 k
DEBUG util.py:448:   elpa-openmpi-devel           ppc64le  2022.05.001-3.fc39          build   51 k
DEBUG util.py:448:   fftw-devel                   ppc64le  3.3.10-7.fc39               build  128 k
DEBUG util.py:448:   flexiblas-devel              ppc64le  3.3.1-5.fc39                build  108 k
DEBUG util.py:448:   gcc-c++                      ppc64le  13.2.1-4.fc40               build   12 M
DEBUG util.py:448:   gcc-gfortran                 ppc64le  13.2.1-4.fc40               build   11 M
DEBUG util.py:448:   glibc-langpack-en            ppc64le  2.38.9000-16.fc40           build  609 k
DEBUG util.py:448:   hostname                     ppc64le  3.23-10.fc40                build   28 k
DEBUG util.py:448:   libint2-devel                ppc64le  2.6.0-15.fc39               build   16 M
DEBUG util.py:448:   libxc-devel                  ppc64le  6.2.2-3.fc39                build   77 k
DEBUG util.py:448:   make                         ppc64le  1:4.4.1-2.fc39              build  597 k
DEBUG util.py:448:   mpich-devel                  ppc64le  4.1.2-7.fc40                build  1.3 M
DEBUG util.py:448:   openmpi-devel                ppc64le  4.1.5-8.fc40                build  1.2 M
DEBUG util.py:448:   python3-devel                ppc64le  3.12.0-2.fc40               build  273 k
DEBUG util.py:448:   python3-fypp                 noarch   3.2-1.fc40                  build   72 k
DEBUG util.py:448:   scalapack-mpich-devel        ppc64le  2.2.0-6.fc40                build  8.3 k
DEBUG util.py:448:   scalapack-openmpi-devel      ppc64le  2.2.0-6.fc40                build  8.2 k
DEBUG util.py:448:   spglib-devel                 ppc64le  2.0.2-3.fc39                build   14 k
DEBUG util.py:448:   openblas                     ppc64le  0.3.24-1.fc40               build   37 k
DEBUG util.py:448:   openblas-openmp              ppc64le  0.3.24-1.fc40               build  4.5 M
DEBUG util.py:448:   openblas-openmp64            ppc64le  0.3.24-1.fc40               build  4.5 M

@mtaillefumier
Copy link
Contributor

Thanks. I would not worry too much about such small errors. The reasons can be multiple (different order in the arithmetic operation, bug, or something else). Maybe @oschuett can give his opinion about this but I am not too worried.

@opoplawski
Copy link
Contributor Author

Yeah, they are not to concerning -but it would be nice if thresholds could be increased so they don't fail. Otherwise we have to ignore errors and might miss a real breaking change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants