-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sequential kinetic operator #1655
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #1655 +/- ##
===========================================
- Coverage 83.06% 50.51% -32.55%
===========================================
Files 289 289
Lines 17538 17549 +11
Branches 3404 3407 +3
===========================================
- Hits 14568 8865 -5703
- Misses 2321 8018 +5697
- Partials 649 666 +17 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Just a small comment below:
): | ||
r"""Args: | ||
hilbert: The underlying Hilbert space on which the operator is defined | ||
mass: float if all masses are the same, list indicating the mass of each particle otherwise | ||
dtype: Data type of the matrix elements. Defaults to `np.float64` | ||
method: Method to compute the kinetic energy: "sequential" or "parallel", where sequential uses less memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would either rename these to "vmap" and "forloop" or something similar or say what is actually meant with 'sequential' and 'parallel'. Smth like: "Sequential" uses a jax.fori_loop over all tangent vectors to compute the diagonal of the hessian while "parallel" uses jax.vmap. "Sequential" therefore uses less memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PhilipVinc , what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be more inclined to think this should be two different flags.
One is the algorithm and the other is the chunk size (which does not necessarily need to be 1).
For example if you want to use Feenberg
you might want or not to chunk it as well.
Or if you want to use Taylor mode AD or Forward mode Laplacian... those are all methods, and they might or might require chunking.
NOTE: by chunking here of course I'm referring to the chunking of the degrees of freedom. So to avoid confusion the correct name should not be chunking...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless I think this is a great addition. I would just like this point to be settled.
I had toyed with this https://github.com/PhilipVinc/netket/blob/d256c67cae4a44e9e6cfd43f317716cf291f5f40/netket/operator/_laplacian.py#L292 a while ago, and I would assume that this inner chunk size
or whatever can be combined with those different algorithms.
Taylor mode differentiation is quite a bit faster than backward over forward for deep nets so It is relatively useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I agree with @PhilipVinc but I would only provide as algorithm: Feenberg, Taylormode, Normal
and then chunking = bool. If True: foriloop otherwise vmap.
I would not provide any algorithm with fwd-mode for the first derivative or reverse-mode for the second derivative. As far as I understand you want to use reverse mode if f:R^n -> R^m has m << n. For us we will always have m < n if not m<<n because m=1 and n = Nparticles * dim, so always reverse mode.
Then for df: R^n -> R^(nxn) does not have m < n so you always want to use forward mode. Do I understand something wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed. The fwd fwd mode is not useful at all for our use cases
make_hess, None, jnp.eye(x.shape[0], dtype=x.dtype) | ||
) | ||
dp_dx2 = jnp.diag(dp_dx2.reshape(x.shape[0], x.shape[0])) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add Feenberg as:
elif self._method == "Feenberg":
return 0.5*jnp.sum(inverse_mass * dp_dx**2, axis=-1)
although this will only be valid for open systems (wave-function goes to zero at infinity) or periodic systems where the wavefunction is forced to zero gradient at the boundaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe useful to add a check on boundary conditions then if Feenberg is used
@gpescia