Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduced basis not found in allowed number of iterations #4975

Closed
kayahans opened this issue May 7, 2024 · 5 comments · Fixed by #5002
Closed

Reduced basis not found in allowed number of iterations #4975

kayahans opened this issue May 7, 2024 · 5 comments · Fixed by #5002

Comments

@kayahans
Copy link
Contributor

kayahans commented May 7, 2024

Describe the bug
QMCPACK abruptly terminates 1-2 seconds after it starts producing the following error in the error stdout:
Reduced basis not found in allowed number of iterations. Check unit cell or contact a developer.
Calculations were submitted using Nexus, the xsf structure file produced by Nexus looks fine.

To Reproduce
Using QMCPACK github version: Last git commit date: Mon Apr 29 18:46:51 2024 -0400
Use the attached input files, except for the wavefunction. For the wavefunction, please let me know a suitable location to copy if you need it.

Expected behavior
QMCPACK should recognize this as a valid structure.

System:

  • Cades, ORNL
  • module purge; source $MODULESHOME/init/bash; module load PE-intel/3.0; module swap intel intel/2021.1; module load intel/2021.1; module swap openmpi openmpi/4.1.0; module load gcc/10.2.0; module load python ;module load fftw/3.3.5; module load boost/1.70.0; module load libxml2/2.9.9; module list; LD_LIBRARY_PATH=/software/tools/compilers/intel_2021/mkl/2021.1.1/lib/intel64:$LD_LIBRARY_PATH
  • other systems where this is reproducible: None

Additional context
files.tar.gz

@prckent
Copy link
Contributor

prckent commented May 9, 2024

Some comments, background:

I notice your cell is particularly "tall"

   a        b        c       alpha    beta     gamma
 4.26837  4.26837 27.00996  99.0925  99.0925  60.0000

and the error is from src/Particle/Lattice/LatticeAnalyzer.h

template<typename T>
inline void find_reduced_basis(TinyVector<TinyVector<T, 3>, 3>& rb)
{
  int maxIter = 10000;

  for (int count = 0; count < maxIter; count++)
  {
    TinyVector<TinyVector<T, 3>, 3> saved(rb);
    bool changed = false;
    for (int i = 0; i < 3; ++i)
    {
      rb[i]   = 0.0;
      changed = found_shorter_base(rb);
      rb[i]   = saved[i];
      if (changed)
        break;
    }
    if (!changed && !found_shorter_base(rb))
      return;
  }

  throw std::runtime_error("Reduced basis not found in allowed number of iterations. "
                           "Check unit cell or contact a developer.");
}

The algorithm being used is a bit strange and will need looking at. The failure occurs during initialization, well before any Monte Carlo. Presumably found_shorter_base is malfunctioning / is inefficient. The implementation has several numerical tolerance thresholds in it.

@prckent
Copy link
Contributor

prckent commented May 11, 2024

This goes wrong after first use of the wavefunction. Can you please put the pwscf.pwscf.h5 in (say) the global shared on OLCF?

@kayahans
Copy link
Contributor Author

@prckent Thank you Paul for following up. All the files are copied to /lustre/orion/mat151/world-shared/ksu/github_4975 in Frontier.

@prckent
Copy link
Contributor

prckent commented May 17, 2024

Tried a GCC 13.2 CPU build on nitrogen2 (RHEL9.3) and was not able to reproduce the problem. Will try CADES directly. Possibly there is a compiler or numerical tolerance issue.

Also, 138GiB wavefunction file!

Edit: Also tried Ubuntu 22.04, gcc 11.4 and clang 14.

@prckent
Copy link
Contributor

prckent commented May 17, 2024

Please send me your build script or upload here. The one for CADES is clearly well out of date and there is not a new enough cmake available system-wide to build the development version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants