New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of pair_style uf3 and uf3/kk #4110
base: develop
Are you sure you want to change the base?
Conversation
…oved some old comments
src/ML-UF3/pair_uf3.cpp
Outdated
jnum = numshort - 1; | ||
for (jj = 0; jj < jnum; jj++) { | ||
fij[0] = fji[0] = 0; | ||
fij[1] = fji[1] = 0; | ||
fij[2] = fji[2] = 0; | ||
j = neighshort[jj]; | ||
jtype = type[j]; | ||
del_rji[0] = x[j][0] - xtmp; | ||
del_rji[1] = x[j][1] - ytmp; | ||
del_rji[2] = x[j][2] - ztmp; | ||
rij = | ||
sqrt(((del_rji[0] * del_rji[0]) + (del_rji[1] * del_rji[1]) + (del_rji[2] * del_rji[2]))); | ||
|
||
// kth atom | ||
for (kk = jj + 1; kk < numshort; kk++) { | ||
|
||
fik[0] = fki[0] = 0; | ||
fik[1] = fki[1] = 0; | ||
fik[2] = fki[2] = 0; | ||
|
||
fjk[0] = fkj[0] = 0; | ||
fjk[1] = fkj[1] = 0; | ||
fjk[2] = fkj[2] = 0; | ||
|
||
k = neighshort[kk]; | ||
ktype = type[k]; | ||
del_rki[0] = x[k][0] - xtmp; | ||
del_rki[1] = x[k][1] - ytmp; | ||
del_rki[2] = x[k][2] - ztmp; | ||
rik = sqrt( | ||
((del_rki[0] * del_rki[0]) + (del_rki[1] * del_rki[1]) + (del_rki[2] * del_rki[2]))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of unnecessary rij recalculations here -- see comment at line 1015
if (pot_3b) { | ||
if (rij <= cut_3b_list[itype][jtype]) { | ||
neighshort[numshort] = j; | ||
if (numshort >= maxshort - 1) { | ||
maxshort += maxshort / 2; | ||
memory->grow(neighshort, maxshort, "pair:neighshort"); | ||
} | ||
numshort = numshort + 1; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from what I understand, this creates an internal "shorter-neighbor list" within which to evaluate 3body interactions. To me a simpler approach would be:
- something like
double * rij_list = new double[inum]
(but create and manage the memory in a broader context for better predictability and performance) - after calculating each
rij
, store it:rij_list[jj] = rij
- you now have the distances on hand for 3-body interactions later on
An extra inum
-worth of doubles isn't a prohibitive amount of memory -- 1000 doubles is 8 KB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could even afford to store the delx, dely, and delz doubles in their own arrays as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree storing an array of rij will reduce the number of computations later on.
But I also need to know the type of atoms in the 3b interaction evaluation. So I need to keep the current neighshort
.
Also, in practice, the 3b cutoffs are always shorter than 2-body cutoffs (usually by a couple of angstroms). Do you still think I need to make the rij_list
of the size of inum
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main worry is a bunch of unpredictable memory realllocations happening inside the inner loop of a force calculation. The if(pot3b) ...
code is being triggered nlocal * jnum
times and all it takes is one processor needing to do a grow
to imbalance that timestep's force calculation.
Types are straightforward to track: if jj
runs from 0 to jnum
, the distance rij_list[jj]
belongs to particle jlist[jj]
which has particle type type[jlist[jj]]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Types are straightforward to track: if
jj
runs from 0 tojnum
, the distancerij_list[jj]
belongs to particlejlist[jj]
which has particletype type[jlist[jj]]
.
I am bit confused.
jnum
is decided by the cutoff of 3body interaction and, in practice, is always less than 2body.
In neighshort
, I am storing j
ie jlist[jj]
during the 2body loop. Not every j
in 2body loop forms a valid 3body triangle.
So, if my understanding of your suggestion is correct, you are suggesting to set jnum
of 3body loop equal to jnum
of 3body loop(?). If that is done, a lot of iterations in the 3body loop will do nothing because of the distance cutoff checks.
I apologize if I am not understanding your suggestion correctly.
src/ML-UF3/pair_uf3.cpp
Outdated
for (int l=0; l < n3b_coeff_matrix[key].size(); l++){ | ||
for (int m=0; m < n3b_coeff_matrix[key][l].size(); m++){ | ||
bytes += (double)2*n3b_coeff_matrix[key][l][m].size()*sizeof(double); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see that n3b_coeff_matrix is used outside the initialization functions -- clearing it after initialization would save some memory (probably not much?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am passing n3b_coeff_matrix[key] by reference to 3b bspline object (triplet_bspline UFBS3b), which then uses it to calculate energies and forces
src/ML-UF3/uf3_pair_bspline.cpp
Outdated
double *uf3_pair_bspline::eval(double r) | ||
{ | ||
|
||
// Find knot starting position | ||
|
||
int start_index=(this->*get_starting_index)(r); | ||
/*if (knot_vect.front() <= r && r < knot_vect.back()) { | ||
//Determine the interval for value_rij | ||
for (int i = 3; i < knot_vect_size - 1; ++i) { | ||
if (knot_vect[i] <= r && r < knot_vect[i + 1]) { | ||
start_index = i; | ||
break; | ||
} | ||
} | ||
}*/ | ||
|
||
int knot_affect_start = start_index - 3; | ||
|
||
double rsq = r * r; | ||
double rth = rsq * r; | ||
|
||
// Calculate energy | ||
|
||
ret_val[0] = bspline_bases[knot_affect_start + 3].eval0(rth, rsq, r); | ||
ret_val[0] += bspline_bases[knot_affect_start + 2].eval1(rth, rsq, r); | ||
ret_val[0] += bspline_bases[knot_affect_start + 1].eval2(rth, rsq, r); | ||
ret_val[0] += bspline_bases[knot_affect_start].eval3(rth, rsq, r); | ||
|
||
// Calculate force | ||
|
||
ret_val[1] = dnbspline_bases[knot_affect_start + 2].eval0(rsq, r); | ||
ret_val[1] += dnbspline_bases[knot_affect_start + 1].eval1(rsq, r); | ||
ret_val[1] += dnbspline_bases[knot_affect_start].eval2(rsq, r); | ||
|
||
return ret_val; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of returning a pointer-to-double you can return a force double and modify the energy by reference. So the function signature would be
double uf3_pair_bspline::eval(double r, double &eng)
and your evaluations would proceed as
... eng = 0
eng += \* first energy component *\
eng += \* second energy component *\
...
f = 0
f += \* first force component *\
f += \* second force component *\
... return f
I am not sure how much this changes performance but it will make code more readable for future maintainers.
src/ML-UF3/pair_uf3.cpp
Outdated
double *pair_eval = UFBS2b[itype][jtype].eval(rij); | ||
|
||
fpair = -1 * pair_eval[1] / rij; | ||
|
||
fx = delx * fpair; | ||
fy = dely * fpair; | ||
fz = delz * fpair; | ||
|
||
f[i][0] += fx; | ||
f[i][1] += fy; | ||
f[i][2] += fz; | ||
f[j][0] -= fx; | ||
f[j][1] -= fy; | ||
f[j][2] -= fz; | ||
|
||
if (eflag) evdwl = pair_eval[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment on the eval
function -- it may be more readable to modify energy by reference. So the function signature would be
double CLASSNAME::eval(double r, double &eng)
and it could be called as
fpair = -1 * UFBS2b[itype][jtype].eval(rij, evdwl) / rij;
This has minor readability impact here -- later on in the 3-body functions, it is extremely useful to have the variable names in both the function definition in the spline CPPs and in the function call here, instead of having to separately document what each index in a returned double pointer means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I also pass eflag to the eval function? Something like UFBS2b[itype][jtype].eval(rij, evdwl, eflag)
? This will save some computations as energy evaluation is not needed on every timestep?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the energy calculation is costly enough, then yes, you can pass eflag and let eval
save some time. But you would know better whether that's worth it.
As a style thing I'd prefer to keep the reference doubles together after the value doubles:
double CLASSNAME::eval(
double rij,
int eflag,
double& evdwl // return energy by ref
){ ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I also pass eflag to the eval function? Something like
UFBS2b[itype][jtype].eval(rij, evdwl, eflag)
? This will save some computations as energy evaluation is not needed on every timestep?
since eflag doesn't change, the most efficient way to handle this would be to make eval a templated function and eflag a template argument. That could hoist the if statement to the outermost level and the if statement will be optimized away since you either call eval<0>()
or eval<1>()
. The compiler will then optimize the "if" away since the condition is a compile time constant. Check out examples in OPT or OPENMP.
src/ML-UF3/pair_uf3.cpp
Outdated
double *triangle_eval = UFBS3b[itype][jtype][ktype].eval(rij, rik, rjk); | ||
|
||
fij[0] = *(triangle_eval + 1) * (del_rji[0] / rij); | ||
fji[0] = -fij[0]; | ||
fik[0] = *(triangle_eval + 2) * (del_rki[0] / rik); | ||
fki[0] = -fik[0]; | ||
fjk[0] = *(triangle_eval + 3) * (del_rkj[0] / rjk); | ||
fkj[0] = -fjk[0]; | ||
|
||
fij[1] = *(triangle_eval + 1) * (del_rji[1] / rij); | ||
fji[1] = -fij[1]; | ||
fik[1] = *(triangle_eval + 2) * (del_rki[1] / rik); | ||
fki[1] = -fik[1]; | ||
fjk[1] = *(triangle_eval + 3) * (del_rkj[1] / rjk); | ||
fkj[1] = -fjk[1]; | ||
|
||
fij[2] = *(triangle_eval + 1) * (del_rji[2] / rij); | ||
fji[2] = -fij[2]; | ||
fik[2] = *(triangle_eval + 2) * (del_rki[2] / rik); | ||
fki[2] = -fik[2]; | ||
fjk[2] = *(triangle_eval + 3) * (del_rkj[2] / rjk); | ||
fkj[2] = -fjk[2]; | ||
|
||
Fi[0] = fij[0] + fik[0]; | ||
Fi[1] = fij[1] + fik[1]; | ||
Fi[2] = fij[2] + fik[2]; | ||
f[i][0] += Fi[0]; | ||
f[i][1] += Fi[1]; | ||
f[i][2] += Fi[2]; | ||
|
||
Fj[0] = fji[0] + fjk[0]; | ||
Fj[1] = fji[1] + fjk[1]; | ||
Fj[2] = fji[2] + fjk[2]; | ||
f[j][0] += Fj[0]; | ||
f[j][1] += Fj[1]; | ||
f[j][2] += Fj[2]; | ||
|
||
Fk[0] = fki[0] + fkj[0]; | ||
Fk[1] = fki[1] + fkj[1]; | ||
Fk[2] = fki[2] + fkj[2]; | ||
f[k][0] += Fk[0]; | ||
f[k][1] += Fk[1]; | ||
f[k][2] += Fk[2]; | ||
|
||
if (eflag) evdwl = *triangle_eval; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where a call-by-reference method can be much clearer. For consistency with the 2body call I'd make eval
still return evdwl
by reference, so this eval
has to be void
(there isn't a natural particular double to return). But you could call it as:
UFBS3b[itype][jtype][ktype].eval(rij, rik, rjk,
Fmag_ij, Fmag_ik, Fmag_jk, evdwl)
and voila! Each of your return variables are now named, not numbered!
…ody/3-body is selected
also remove redundant files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is it for a first round of review. I didn't change most of the unprotected utils::logmesg() calls. In my opinion most should just be removed. They look like debugging output.
One more general comment. It is generally considered a bad idea to read the same file from all the MPI ranks. This can create huge problems with jobs using many MPI processes. The more common procedure is to read everything on rank 0 only and then use MPI communication to broadcast the information to all other ranks.
potentials/A_A_A.uf3_pot
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What element does A stand for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The potential files were modeled after Niobium UF3 potential files. I will replace them with the real files.
Updates for ml-uf3 pull request
…ssors was getting some weird errors. Deleted commented (dead) code. Updated the memory_usage function. Reordered some functions to refelect the calling order
Remove std::vector completely
…oeff_array_size, n3b_knots_array_size arrays to 0; fixed memory leaks; removed some dead code
Bug Fixes- uniform knot spacing, memory leaks, array initialization
@akohlmey Thank you for your suggestion about moving away from std::vectors. It forced me to re-write certain functions. Now arrays are used to store constants of bspline basis set instead of storing uf3_pair_bspline and uf3_triplet_bspline objects (at UFBS2b std::vector<std::vector> and UFBS3b std::vector<std::vector<std::vector>>). Also, I evaluate the energy contribution only when 'eflag' is true instead of every time step. All these changes led to about 30%-35% speed up for the CPU version. |
@monk-04 is this PR ready for review or are more changes coming? |
@stanmoore1 I was waiting for PR #4146 so that the MacOS unit test could pass. Other than that, I don't have any more changes in mind. The PR is ready for review. |
@monk-04 I've submitted a pull request with a few minor changes and replaced the example. |
Update example with new syntax
Summary
This pull request implements the ultra-fast forcefield (uf3) machine learned interatomic potential. Both the CPU and KOKKOS versions of uf3 are implemented.
Related Issue(s)
None
Author(s)
Licensing
By submitting this pull request, I agree, that my contribution will be included in LAMMPS and redistributed under either the GNU General Public License version 2 (GPL v2) or the GNU Lesser General Public License version 2.1 (LGPL v2.1).
Backward Compatibility
Will not break backward compatibility
Implementation Notes
The correctness of the implementation was verified by comparing the lammps computed energy and forces with that calculated from the python version of uf3 found here. The current version computes two-body and three-body interactions, contributions from one-body are ignored.
Post Submission Checklist