Potential issue with MKL spotted in GAMESS-US builds #19791

sassy-crick · 2024-02-05T10:45:53Z

I am experience a problem with MKL (2022.1.0) and various versions of GAMESS-US: the final energy is obtained much faster but is simply wrong. See also PR #19452, #19310)

Details of the system below, first a description of the problem.
From an previous calculation with GAMESS-US version on Debian Stretch I got:
Version: 30 JUN 2019 (R1) 64 BIT LINUX VERSION
gfortran: 6.3.0
OpenBLAS: 0.2.20
FINAL RO-PBE0 ENERGY IS -3624.1256976634 AFTER 52 ITERATIONS

These are the new calculations, with various versions of GAMESS-US compiled differently:

Version: 30 SEP 2023 (R2) 64 BIT LINUX VERSION
intel: 2022.1.0
mkl: 2022.1.0
FINAL RO-PBE0 ENERGY IS -172453.5429717981 AFTER 21 ITERATIONS

Version: 30 SEP 2021 (R2) 64 BIT INTEL VERSION
intel: 2022.1.0
mkl: 2022.1.0
FINAL RO-PBE0 ENERGY IS -149756.6943786606 AFTER 23 ITERATIONS

Version: 30 SEP 2021 (R2) 64 BIT LINUX VERSION
gfortran: 11.3.0
mkl: 2022.1.0
Total Energy after 5 cycles: -188948.9963779443
(run terminated here)

Version: 30 JUN 2019 (R1) 64 BIT LINUX VERSION
gfortran: 11.3.0
OpenBLAS: 0.3.20
FINAL RO-PBE0 ENERGY IS -3624.1256976820 AFTER 52 ITERATIONS

Version: 30 SEP 2021 (R2) 64 BIT LINUX VERSION
gfortran: 11.3.0
OpenBLAS: 0.3.20
Total Energy after 41 cycles: -3624.0268377502

So, for me it looks like the problem is within the MKL library. I did not check different versions.

This is the top of the input file:
$CONTRL SCFTYP=ROHF RUNTYP=energy MAXIT=100 aimpac=.T.
DFTTYP=PBE0 ICHARG=0 MULT=4 COORD=unique RELWFN=LUT-IOTC ISPHER=1 $END
$BASIS extfil=.T. GBASIS=SPKrDZC $END
$SYSTEM MEMORY=32000000 kdiag=3 $END
$SCF DIRSCF=.T. DAMP=.T. SOSCF=.F. $END
$STATPT NSTEP=200 opttol=1E-06 HSSEND=.T. $end
$STATPT PURIFY=.T. PROJCT=.T. $END
$FORCE PURIFY=.T. PROJCT=.T. METHOD=SEMINUM $END
$dft IDCVER=3 DC=.T. $end
$dft nrad=125 nleb=1202 $end

I am running Debian Linux Bullseye. The compilers and the BLAS libraries were installed with EasyBuild 4.9.0 (production, only OpenBLAS as currently an open PR #19310 (#19310)

All builds were done using sockets, all standard test-jobs were passed in serial mode.
I doubt this is a problem with EasyBuild or how we install the software. Concerning is that the test jobs passed, yet clearly there is a problem.

I have also contacted the developers of GAMESS-US

imciner2 · 2024-02-05T11:50:34Z

One thing to try would be to force MKL to use its various architecture specific paths and see if the error is localized to only one of them. It looks like it can be done with the MKL_CBWR environment variable (the documentation for 2022.1.0 isn't available online anymore apparently, but here is the page for 2023.0 at least: https://www.intel.com/content/www/us/en/docs/onemkl/developer-guide-linux/2023-0/specifying-code-branches.html).

lexming · 2024-02-05T17:52:38Z

On my side a simple RHF optimization of CH2 with GAMESS-US v20230930-R2 with gompi/2022a and intel-compilers/2022.1.0 both give the exact same result in energies and in molecular structure.

This is the input:

 $CONTRL SCFTYP=RHF RUNTYP=OPTIMIZE COORD=ZMT NZVAR=0 $END
 $SYSTEM TIMLIM=1 $END
 $STATPT OPTTOL=1.0E-5  $END
 $BASIS  GBASIS=STO NGAUSS=2 $END
 $GUESS  GUESS=HUCKEL $END
 $DATA
Methylene...1-A-1 state...RHF/STO-2G
Cnv  2

C
H  1 rCH
H  1 rCH  2 aHCH

rCH=1.09
aHCH=110.0
 $END

And this is the relevant output:

         WAVEFUNCTION NORMALIZATION =       1.0000000000

                ONE ELECTRON ENERGY =     -61.9332705667
                TWO ELECTRON ENERGY =      18.7391946778
           NUCLEAR REPULSION ENERGY =       5.9560361192
                                      ------------------
                       TOTAL ENERGY =     -37.2380397698

 ELECTRON-ELECTRON POTENTIAL ENERGY =      18.7391946778
  NUCLEUS-ELECTRON POTENTIAL ENERGY =     -99.1163682737
   NUCLEUS-NUCLEUS POTENTIAL ENERGY =       5.9560361192
                                      ------------------
             TOTAL POTENTIAL ENERGY =     -74.4211374768
               TOTAL KINETIC ENERGY =      37.1830977070
                 VIRIAL RATIO (V/T) =       2.0014776085

All these energy components are identical to the last digit in both gompi/2022a and intel-compilers/2022.1.0. And I ran this test in parallel using 4 cores.

So, a few comments:

the installation with MKL is not fundamentally flawed, otherwise the tests in GAMESS would not pass (and they do)
this is probably a bug in some specific algorithm not covered by the tests in GAMESS (and that probably means exotic/experimental/unmaintained code)
can you share all input files to reproduce your issue?

This looks like a bug in some part of GAMESS-US and should be handled by the devs.

sassy-crick · 2024-02-05T18:34:49Z

@lexming
Thanks for looking into that, much appreciated.

However, there are a few things I think I need to clarify a bit.

For starters, as you can see from my input file, I am using a ROHF which is different from your RHF one: different algorithm. So if the problem is only in ROHF, and you test with RHF, that will not show the problem.

Next: I am not using an experimental or otherwise exotic/unmaintained method. ROHF is around for years, same goes for the used basis sets. All of that is in GAMESS-US for quite a long time as else I would have not raised an issue. So we can rule that out.

Finally: I did not mean to imply the installation with MKL is flawed, apologies if that came across like that. In my opinion I found a bug in MKL, and here I point to that specific version of MKL. As I mentioned, I have contacted the developers already, including the GAMESS-US user email list of which I am an active member for probably 20 years or so now.
The reason to put it here as well is to make it a bit more public, given I cannot raise issues on the GAMESS-US GitHub page myself.

I cannot make that specific calculation publicly available as that is confidentially work, apologies for that. As in my case all the test jobs are working as well, but there is clearly an issue with MKL, I would suggest to pause the merging of the PRs which contain MKL for now, until we got to the bottom of the problem. I will see if I can find a test-job which is reproducing the problem in the meantime, so others can test too. Bear with me on that please.

sassy-crick · 2024-02-05T23:58:30Z

I done some more work and can now provide a mock-input file. Mock as it is not a real molecule but it does show the problem in less time than the real, production on. That is the only thing they got in common, that molecule is pure fiction and any resemblance with a real molecule are sheer coincidence.

The results:
20190630-R1-GCC-11.3.0-OpenBLAS

 FINAL RO-PBE0 ENERGY IS    -2381.4416489599 AFTER  76 ITERATIONS

20210930-R2-GCC-11.3.0-OpenBLAS

 FINAL RO-PBE0 ENERGY IS    -2381.4416489180 AFTER  76 ITERATIONS

20210930-R2-GCC-11.3.0

 FINAL RO-PBE0 ENERGY IS    -5543.7041388352 AFTER  98 ITERATIONS

20230930-R2-intel-compilers-2022.1.0

SCF did not converge after 100 cycles
Energy at cycle 100: -5640.8250422811

Here is the input file. As you can see, the top-bit is identical of what I have posted before, it is just the coordinates which are different of this mock molecule:

 $CONTRL SCFTYP=ROHF RUNTYP=energy MAXIT=100 aimpac=.T.
  DFTTYP=PBE0 ICHARG=0 MULT=4 COORD=ZMTMPC RELWFN=LUT-IOTC ISPHER=1 $END
 $BASIS extfil=.T. GBASIS=SPKrDZC $END
 $SYSTEM MEMORY=32000000 kdiag=3 $END
 $SCF DIRSCF=.T. DAMP=.T. SOSCF=.F. $END
 $STATPT NSTEP=200 opttol=1E-06 HSSEND=.T. $end
 $STATPT PURIFY=.T. PROJCT=.T. $END
 $FORCE PURIFY=.T.  PROJCT=.T. METHOD=SEMINUM $END
 $dft IDCVER=3 DC=.T. $end
 $dft nrad=125 nleb=1202 $end

 $DATA
testmolecule quartet SP PBE0/SPKrDZC
C1 1
Co   0.0000000  0  0.0000000  0  0.0000000  0     0     0     0
N    2.0296086  1  0.0000000  0  0.0000000  0     1     0     0
H    1.0220000  1  114.78418  1  0.0000000  0     2     1     0
C    1.3051598  1  120.26220  1  169.13796  1     2     3     1
H    1.1010000  1  121.56927  1  12.491068  1     4     2     3
H    1.1010000  1  123.09958  1 -165.77727  1     4     2     3
N    2.0159071  1  122.68529  1 -105.72323  1     1     2     3
H    1.0220000  1  112.59556  1 -91.419653  1     7     1     2
C    1.3077805  1  126.15910  1  97.673940  1     7     1     2
H    1.1010000  1  121.47863  1  178.35228  1     9     7     1
H    1.1010000  1  124.35796  1 -0.2756403  1     9     7     1
S    2.2885077  1  109.50115  1  8.4887035  1     1     2     3
H    1.3400000  1  102.37837  1 -101.97046  1    12     1     2
S    2.2906874  1  95.687540  1  133.45574  1     1     2     3
H    1.3400000  1  98.171847  1  41.893167  1    14     1     2
 $end

And finally:

$ lscpu
Vendor ID:                          GenuineIntel
CPU family:                         6
Model:                              60
Model name:                         Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

I hope that helps.

lexming · 2024-02-07T09:17:17Z

Next: I am not using an experimental or otherwise exotic/unmaintained method. ROHF is around for years, same goes for the used basis sets. All of that is in GAMESS-US for quite a long time as else I would have not raised an issue. So we can rule that out.

You are indeed using ROHF which is a well-known wavefunction method. However, your calculation is also doing a DFT with the PBE0 functional. That one is also pretty standard, but combining the two is not. That is done by things like density corrected DFT, which is by no means common. Moreover, the SPKrDZC basis set does not look familiar either, so those might be pretty new as well.

IMO, you are in uncharted territory, not covered by the tests and using somewhat novel methods. Therefore, patching this bug would be nice, but that falls on the hands of the devs. And I see no reason to block merging the related easyconfig given that all tested features do work as expected. At much we can put a warning note about this particular issue in the easyconfig.

sassy-crick · 2024-02-12T18:26:31Z

Some more updates on that. I think I nailed down the problem:

20210930-R2-GCC-11.3.0 with imkl-2022.1.0, libxc 5.2.3, openmp=True:
E = -5733.2260257532, converged

20210930-R2-GCC-11.3.0-nomp with imkl-2022.1.0, libxc 5.2.3, openmp=False
E = -2381.4315494597, converged

20210930-R2-GCC-11.3.0-OpenBLAS with OpenBLAS-0.3.20-int8,  openmp=False
E = -2381.4315492878, converged

20210930-R2-GCC-11.3.0-OpenBLAS-omp with  OpenBLAS-0.3.20-int8,  openmp=True
E = -5699.5102296925, not converged

20210930-R2-intel-compilers with imkl-2022.1.0, libxc 5.2.3, openmp=True:
E = -4804.0826025185, not converged

20210930-R2-intel-compilers-mkl-2024.0.0 with  imkl-2024.0.0, libxc 5.2.3, openmp=True:
E = -4867.4992439979, not converged

20210930-R2-intel-compilers-2022.1.0-nomp with imkl-2022.1.0, libxc 5.2.3, openmp=False:
E = -2381.4315493693, converged

So for me, it looks like the problem is not mkl as initially suspected but the global use of openmp which should not done like this. It might be better to set within the EasyConfig file something like

GMS_OPENMP = True

and catch that in the EasyBlock. Using a global option causes quite a number of warnings during the compilation.
GAMESS-US is particular in the way it is compiled: some modules are only compiled with -O1 or less, hence the comp script.

I hope that helps to sort out the problem.

dvmorenor · 2024-02-12T19:24:43Z

I done some Work, I hope helps this discussion. I using Mock-input file.

The results :
20221130R2 aocc lapack GMS_OPENMP = False
FINAL RO-PBE0 ENERGY IS -2381.4415652043 AFTER 76 ITERATIONS
20221130R2 aocc libflame (from AMD) GMS_OPENMP = False
FINAL RO-PBE0 ENERGY IS -2381.4415650233 AFTER 76 ITERATIONS
20221130R2 intel mkl GMS_OPENMP=False
FINAL RO-PBE0 ENERGY IS -2381.4415651062 AFTER 76 ITERATIONS
20221130R2 gfortran mkl GMS_OPENMP=False
FINAL RO-PBE0 ENERGY IS -2381.4415651378 AFTER 76 ITERATIONS
20221130R2 aocc atlas GMS_OPENMP =True
FINAL RO-PBE0 ENERGY IS -5984.6449046098 AFTER 18 ITERATIONS

gfortran --version
GNU Fortran (Debian 10.2.1-6) 10.2.1 20210110
aocc AMD ( flang --version )
AMD clang version 14.0.6 (CLANG: AOCC_4.0.0-Build#434 2022_10_28) (based on LLVM Mirror.Version.14.0.6)
intel, mkl (oneAPI 2024.0)
lapack lib from GAMESS download option 3.10.1

and finally,

Familia de CPU:                      25
Modelo:                              33
Nombre del modelo:                   AMD Ryzen 7 5700X 8-Core Processor

lexming · 2024-05-08T09:30:41Z

As discussed, builds with OpenMP will no longer set compilation flags for openmp across the board but just indicate to the build scripts of GAMESS-US to enable OpenMP. Fix in commit easybuilders/easybuild-easyblocks@1bfd253

lexming mentioned this issue May 8, 2024

refactor GAMESS-US easyblock to directly write install.info (v2) easybuilders/easybuild-easyblocks#3047

Merged

branfosj closed this as completed in easybuilders/easybuild-easyblocks#3047 May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential issue with MKL spotted in GAMESS-US builds #19791

Potential issue with MKL spotted in GAMESS-US builds #19791

sassy-crick commented Feb 5, 2024

imciner2 commented Feb 5, 2024

lexming commented Feb 5, 2024 •

edited

sassy-crick commented Feb 5, 2024

sassy-crick commented Feb 5, 2024

lexming commented Feb 7, 2024

sassy-crick commented Feb 12, 2024

dvmorenor commented Feb 12, 2024 •

edited

lexming commented May 8, 2024

Potential issue with MKL spotted in GAMESS-US builds #19791

Potential issue with MKL spotted in GAMESS-US builds #19791

Comments

sassy-crick commented Feb 5, 2024

imciner2 commented Feb 5, 2024

lexming commented Feb 5, 2024 • edited

sassy-crick commented Feb 5, 2024

sassy-crick commented Feb 5, 2024

lexming commented Feb 7, 2024

sassy-crick commented Feb 12, 2024

dvmorenor commented Feb 12, 2024 • edited

lexming commented May 8, 2024

lexming commented Feb 5, 2024 •

edited

dvmorenor commented Feb 12, 2024 •

edited