Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of pair_style uf3 and uf3/kk #4110

Open
wants to merge 77 commits into
base: develop
Choose a base branch
from

Conversation

monk-04
Copy link
Collaborator

@monk-04 monk-04 commented Mar 25, 2024

Summary

This pull request implements the ultra-fast forcefield (uf3) machine learned interatomic potential. Both the CPU and KOKKOS versions of uf3 are implemented.

Related Issue(s)

None

Author(s)

  1. Ajinkya C Hire (University of Florida), email- ajinkya.hire@ufl.edu, ajinkyahire62@gmail.com
  2. Hendrik Kraß (University of Constance)
  3. Matthias Rupp (Luxembourg Institute of Science and Technology), email- mrupp@mrupp.info
  4. Richard Hennig (University of Florida), email- rhennig@ufl.edu

Licensing

By submitting this pull request, I agree, that my contribution will be included in LAMMPS and redistributed under either the GNU General Public License version 2 (GPL v2) or the GNU Lesser General Public License version 2.1 (LGPL v2.1).

Backward Compatibility

Will not break backward compatibility

Implementation Notes

The correctness of the implementation was verified by comparing the lammps computed energy and forces with that calculated from the python version of uf3 found here. The current version computes two-body and three-body interactions, contributions from one-body are ignored.

Post Submission Checklist

  • The feature or features in this pull request is complete
  • Licensing information is complete
  • Corresponding author information is complete
  • The source code follows the LAMMPS formatting guidelines
  • Suitable new documentation files and/or updates to the existing docs are included
  • The added/updated documentation is integrated and tested with the documentation build system
  • The feature has been verified to work with the conventional build system
  • The feature has been verified to work with the CMake based build system
  • Suitable tests have been added to the unittest tree.
  • A package specific README file has been included or updated
  • One or more example input decks are included

@akohlmey akohlmey self-assigned this Mar 26, 2024
Comment on lines 1083 to 1113
jnum = numshort - 1;
for (jj = 0; jj < jnum; jj++) {
fij[0] = fji[0] = 0;
fij[1] = fji[1] = 0;
fij[2] = fji[2] = 0;
j = neighshort[jj];
jtype = type[j];
del_rji[0] = x[j][0] - xtmp;
del_rji[1] = x[j][1] - ytmp;
del_rji[2] = x[j][2] - ztmp;
rij =
sqrt(((del_rji[0] * del_rji[0]) + (del_rji[1] * del_rji[1]) + (del_rji[2] * del_rji[2])));

// kth atom
for (kk = jj + 1; kk < numshort; kk++) {

fik[0] = fki[0] = 0;
fik[1] = fki[1] = 0;
fik[2] = fki[2] = 0;

fjk[0] = fkj[0] = 0;
fjk[1] = fkj[1] = 0;
fjk[2] = fkj[2] = 0;

k = neighshort[kk];
ktype = type[k];
del_rki[0] = x[k][0] - xtmp;
del_rki[1] = x[k][1] - ytmp;
del_rki[2] = x[k][2] - ztmp;
rik = sqrt(
((del_rki[0] * del_rki[0]) + (del_rki[1] * del_rki[1]) + (del_rki[2] * del_rki[2])));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of unnecessary rij recalculations here -- see comment at line 1015

Comment on lines +1015 to +1024
if (pot_3b) {
if (rij <= cut_3b_list[itype][jtype]) {
neighshort[numshort] = j;
if (numshort >= maxshort - 1) {
maxshort += maxshort / 2;
memory->grow(neighshort, maxshort, "pair:neighshort");
}
numshort = numshort + 1;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from what I understand, this creates an internal "shorter-neighbor list" within which to evaluate 3body interactions. To me a simpler approach would be:

  1. something like double * rij_list = new double[inum] (but create and manage the memory in a broader context for better predictability and performance)
  2. after calculating each rij, store it: rij_list[jj] = rij
  3. you now have the distances on hand for 3-body interactions later on

An extra inum-worth of doubles isn't a prohibitive amount of memory -- 1000 doubles is 8 KB.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could even afford to store the delx, dely, and delz doubles in their own arrays as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree storing an array of rij will reduce the number of computations later on.
But I also need to know the type of atoms in the 3b interaction evaluation. So I need to keep the current neighshort.

Also, in practice, the 3b cutoffs are always shorter than 2-body cutoffs (usually by a couple of angstroms). Do you still think I need to make the rij_list of the size of inum?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main worry is a bunch of unpredictable memory realllocations happening inside the inner loop of a force calculation. The if(pot3b) ... code is being triggered nlocal * jnum times and all it takes is one processor needing to do a grow to imbalance that timestep's force calculation.

Types are straightforward to track: if jj runs from 0 to jnum, the distance rij_list[jj] belongs to particle jlist[jj] which has particle type type[jlist[jj]].

Copy link
Collaborator Author

@monk-04 monk-04 Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Types are straightforward to track: if jj runs from 0 to jnum, the distance rij_list[jj] belongs to particle jlist[jj] which has particle type type[jlist[jj]].

I am bit confused.
jnum is decided by the cutoff of 3body interaction and, in practice, is always less than 2body.

In neighshort, I am storing j ie jlist[jj] during the 2body loop. Not every j in 2body loop forms a valid 3body triangle.

So, if my understanding of your suggestion is correct, you are suggesting to set jnum of 3body loop equal to jnum of 3body loop(?). If that is done, a lot of iterations in the 3body loop will do nothing because of the distance cutoff checks.

I apologize if I am not understanding your suggestion correctly.

Comment on lines 1284 to 1286
for (int l=0; l < n3b_coeff_matrix[key].size(); l++){
for (int m=0; m < n3b_coeff_matrix[key][l].size(); m++){
bytes += (double)2*n3b_coeff_matrix[key][l][m].size()*sizeof(double);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see that n3b_coeff_matrix is used outside the initialization functions -- clearing it after initialization would save some memory (probably not much?).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am passing n3b_coeff_matrix[key] by reference to 3b bspline object (triplet_bspline UFBS3b), which then uses it to calculate energies and forces

Comment on lines 100 to 135
double *uf3_pair_bspline::eval(double r)
{

// Find knot starting position

int start_index=(this->*get_starting_index)(r);
/*if (knot_vect.front() <= r && r < knot_vect.back()) {
//Determine the interval for value_rij
for (int i = 3; i < knot_vect_size - 1; ++i) {
if (knot_vect[i] <= r && r < knot_vect[i + 1]) {
start_index = i;
break;
}
}
}*/

int knot_affect_start = start_index - 3;

double rsq = r * r;
double rth = rsq * r;

// Calculate energy

ret_val[0] = bspline_bases[knot_affect_start + 3].eval0(rth, rsq, r);
ret_val[0] += bspline_bases[knot_affect_start + 2].eval1(rth, rsq, r);
ret_val[0] += bspline_bases[knot_affect_start + 1].eval2(rth, rsq, r);
ret_val[0] += bspline_bases[knot_affect_start].eval3(rth, rsq, r);

// Calculate force

ret_val[1] = dnbspline_bases[knot_affect_start + 2].eval0(rsq, r);
ret_val[1] += dnbspline_bases[knot_affect_start + 1].eval1(rsq, r);
ret_val[1] += dnbspline_bases[knot_affect_start].eval2(rsq, r);

return ret_val;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of returning a pointer-to-double you can return a force double and modify the energy by reference. So the function signature would be

double uf3_pair_bspline::eval(double r, double &eng)

and your evaluations would proceed as

... eng = 0
    eng += \* first energy component *\
    eng += \* second energy component *\
...
    f = 0
    f += \* first force component *\
    f += \* second force component *\
... return f

I am not sure how much this changes performance but it will make code more readable for future maintainers.

Comment on lines 1026 to 1041
double *pair_eval = UFBS2b[itype][jtype].eval(rij);

fpair = -1 * pair_eval[1] / rij;

fx = delx * fpair;
fy = dely * fpair;
fz = delz * fpair;

f[i][0] += fx;
f[i][1] += fy;
f[i][2] += fz;
f[j][0] -= fx;
f[j][1] -= fy;
f[j][2] -= fz;

if (eflag) evdwl = pair_eval[0];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment on the eval function -- it may be more readable to modify energy by reference. So the function signature would be

double CLASSNAME::eval(double r, double &eng)

and it could be called as

fpair = -1 * UFBS2b[itype][jtype].eval(rij, evdwl) / rij;

This has minor readability impact here -- later on in the 3-body functions, it is extremely useful to have the variable names in both the function definition in the spline CPPs and in the function call here, instead of having to separately document what each index in a returned double pointer means.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I also pass eflag to the eval function? Something like UFBS2b[itype][jtype].eval(rij, evdwl, eflag)? This will save some computations as energy evaluation is not needed on every timestep?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the energy calculation is costly enough, then yes, you can pass eflag and let eval save some time. But you would know better whether that's worth it.

As a style thing I'd prefer to keep the reference doubles together after the value doubles:

double CLASSNAME::eval(
  double rij,
  int eflag,
  double& evdwl // return energy by ref
){ ...

Copy link
Member

@akohlmey akohlmey Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I also pass eflag to the eval function? Something like UFBS2b[itype][jtype].eval(rij, evdwl, eflag)? This will save some computations as energy evaluation is not needed on every timestep?

since eflag doesn't change, the most efficient way to handle this would be to make eval a templated function and eflag a template argument. That could hoist the if statement to the outermost level and the if statement will be optimized away since you either call eval<0>() or eval<1>(). The compiler will then optimize the "if" away since the condition is a compile time constant. Check out examples in OPT or OPENMP.

Comment on lines 1126 to 1170
double *triangle_eval = UFBS3b[itype][jtype][ktype].eval(rij, rik, rjk);

fij[0] = *(triangle_eval + 1) * (del_rji[0] / rij);
fji[0] = -fij[0];
fik[0] = *(triangle_eval + 2) * (del_rki[0] / rik);
fki[0] = -fik[0];
fjk[0] = *(triangle_eval + 3) * (del_rkj[0] / rjk);
fkj[0] = -fjk[0];

fij[1] = *(triangle_eval + 1) * (del_rji[1] / rij);
fji[1] = -fij[1];
fik[1] = *(triangle_eval + 2) * (del_rki[1] / rik);
fki[1] = -fik[1];
fjk[1] = *(triangle_eval + 3) * (del_rkj[1] / rjk);
fkj[1] = -fjk[1];

fij[2] = *(triangle_eval + 1) * (del_rji[2] / rij);
fji[2] = -fij[2];
fik[2] = *(triangle_eval + 2) * (del_rki[2] / rik);
fki[2] = -fik[2];
fjk[2] = *(triangle_eval + 3) * (del_rkj[2] / rjk);
fkj[2] = -fjk[2];

Fi[0] = fij[0] + fik[0];
Fi[1] = fij[1] + fik[1];
Fi[2] = fij[2] + fik[2];
f[i][0] += Fi[0];
f[i][1] += Fi[1];
f[i][2] += Fi[2];

Fj[0] = fji[0] + fjk[0];
Fj[1] = fji[1] + fjk[1];
Fj[2] = fji[2] + fjk[2];
f[j][0] += Fj[0];
f[j][1] += Fj[1];
f[j][2] += Fj[2];

Fk[0] = fki[0] + fkj[0];
Fk[1] = fki[1] + fkj[1];
Fk[2] = fki[2] + fkj[2];
f[k][0] += Fk[0];
f[k][1] += Fk[1];
f[k][2] += Fk[2];

if (eflag) evdwl = *triangle_eval;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where a call-by-reference method can be much clearer. For consistency with the 2body call I'd make eval still return evdwl by reference, so this eval has to be void (there isn't a natural particular double to return). But you could call it as:

UFBS3b[itype][jtype][ktype].eval(rij, rik, rjk,
Fmag_ij, Fmag_ik, Fmag_jk, evdwl)

and voila! Each of your return variables are now named, not numbered!

Copy link
Member

@akohlmey akohlmey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is it for a first round of review. I didn't change most of the unprotected utils::logmesg() calls. In my opinion most should just be removed. They look like debugging output.

One more general comment. It is generally considered a bad idea to read the same file from all the MPI ranks. This can create huge problems with jobs using many MPI processes. The more common procedure is to read everything on rank 0 only and then use MPI communication to broadcast the information to all other ranks.

cmake/CMakeLists.txt Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What element does A stand for?

Copy link
Collaborator Author

@monk-04 monk-04 Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The potential files were modeled after Niobium UF3 potential files. I will replace them with the real files.

doc/src/pair_uf3.rst Outdated Show resolved Hide resolved
doc/src/pair_uf3.rst Outdated Show resolved Hide resolved
src/ML-UF3/pair_uf3.cpp Outdated Show resolved Hide resolved
src/ML-UF3/pair_uf3.cpp Outdated Show resolved Hide resolved
src/ML-UF3/pair_uf3.cpp Outdated Show resolved Hide resolved
src/ML-UF3/pair_uf3.cpp Outdated Show resolved Hide resolved
Updates for ml-uf3 pull request
monk-04 and others added 11 commits May 2, 2024 18:59
…ssors was getting some weird errors.

Deleted commented (dead) code.
Updated the memory_usage function.
Reordered some functions to refelect the calling order
Remove std::vector completely
…oeff_array_size, n3b_knots_array_size arrays to 0; fixed memory leaks; removed some dead code
Bug Fixes- uniform knot spacing, memory leaks, array initialization
@monk-04
Copy link
Collaborator Author

monk-04 commented May 4, 2024

@akohlmey Thank you for your suggestion about moving away from std::vectors. It forced me to re-write certain functions. Now arrays are used to store constants of bspline basis set instead of storing uf3_pair_bspline and uf3_triplet_bspline objects (at UFBS2b std::vector<std::vector> and UFBS3b std::vector<std::vector<std::vector>>). Also, I evaluate the energy contribution only when 'eflag' is true instead of every time step.

All these changes led to about 30%-35% speed up for the CPU version.

@stanmoore1
Copy link
Contributor

@monk-04 is this PR ready for review or are more changes coming?

@monk-04
Copy link
Collaborator Author

monk-04 commented May 13, 2024

@stanmoore1 I was waiting for PR #4146 so that the MacOS unit test could pass.
I see that it has been merged and will merge lammps:develop into ml-uf3.

Other than that, I don't have any more changes in mind.

The PR is ready for review.

@stanmoore1 stanmoore1 marked this pull request as ready for review May 13, 2024 19:11
@stanmoore1 stanmoore1 self-requested a review as a code owner May 13, 2024 19:11
@monk-04 monk-04 requested a review from akohlmey May 13, 2024 19:11
@stanmoore1 stanmoore1 requested a review from srtee May 13, 2024 19:12
@stanmoore1 stanmoore1 assigned stanmoore1 and unassigned monk-04 May 13, 2024
@akohlmey
Copy link
Member

@monk-04 I've submitted a pull request with a few minor changes and replaced the example.

@akohlmey akohlmey requested a review from athomps May 14, 2024 00:41
Update example with new syntax
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: After Stable Release
Development

Successfully merging this pull request may close these issues.

None yet

4 participants