Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jacobian chunking in VMC_SRt #1740

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

attila-i-szabo
Copy link
Collaborator

Some neural networks require much more memory for backpropagation, which makes it useful to allow chunking (with smaller chunks) when computing the Jacobian. (cf. #1347, #1590)

This PR therefore adds Jacobian chunking to VMC_SRt. IIUC all this takes is to add an extra argument to nkjax.jacobian and copying some logic from MCState to set safe chunk sizes.

Outstanding questions:

  • if we don't specify a chunk size explicitly, should VMC_SRt take the vstate's chunk size (like the QGT objects) or not chunk at all?
  • what tests are needed?

@attila-i-szabo
Copy link
Collaborator Author

cc @riccardo-rende @llviteritti

@PhilipVinc
Copy link
Member

What is wrong with #1590? It works well modulo a typo in estate.expect_and_grad.
The only reason I have not merged the PR is because tests are missing.

(On a side note, I was thinking just now that maybe a better name would be jac_chunk_size instead of grad_chunk_size, but that is minor)

SRt should eventually get integrated into standard SR if we find a clean way to do it, so I'm a bit against doing this. I'd rather this be a general feature that benefits everyone. Moreover this is a bit confusing, because until now we keep the chunking in the variational state, while with this we'd have it in two different places.

I would be much happier if this was based of #1590, and vstate.jac_chunk_size was used as the chunk size if necessary.

Copy link

codecov bot commented Feb 29, 2024

Codecov Report

Attention: Patch coverage is 45.45455% with 12 lines in your changes are missing coverage. Please review.

Project coverage is 82.68%. Comparing base (c841005) to head (2115a70).

Files Patch % Lines
netket/experimental/driver/vmc_srt.py 45.45% 11 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1740      +/-   ##
==========================================
- Coverage   82.73%   82.68%   -0.05%     
==========================================
  Files         299      299              
  Lines       18271    18293      +22     
  Branches     3480     3487       +7     
==========================================
+ Hits        15116    15126      +10     
- Misses       2476     2487      +11     
- Partials      679      680       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@attila-i-szabo
Copy link
Collaborator Author

attila-i-szabo commented Feb 29, 2024

What is wrong with #1590?

It's not merged. Also, it only implements a different backprop chunk size for expect_and_* not for all backpropagation generally. I.e. it wouldn't solve this problem (AFAICT VMC_SRt doesn't even use expect_and_grad)

It works well modulo a typo in estate.expect_and_grad. The only reason I have not merged the PR is because tests are missing.

(On a side note, I was thinking just now that maybe a better name would be jac_chunk_size instead of grad_chunk_size, but that is minor)

I would be much happier if this was based of #1590, and vstate.jac_chunk_size was used as the chunk size if necessary.

You have to discuss with @chrisrothUT

SRt should eventually get integrated into standard SR if we find a clean way to do it, so I'm a bit against doing this. I'd rather this be a general feature that benefits everyone.

That would be nice, but in the meanwhile it doesn't hurt to make SRt more usable. I'm also a little skeptical whether the two workflows can be merged nicely (I guess that's partly why it hasn't been done yet).

Moreover this is a bit confusing, because until now we keep the chunking in the variational state, while with this we'd have it in two different places.

No, QGTJacobian also does its own chunking (i.e., calls nkjax.jacobian with a chunk_size argument, like my proposal here)

@PhilipVinc
Copy link
Member

No, QGTJacobian also does its own chunking (i.e., calls nkjax.jacobian with a chunk_size argument, like my proposal here)

Yes and no. QGTJacobian defaults to the chunk size of the variational state if it is there, even if it is possible to override it.

It's not merged. Also, it only implements a different backprop chunk size for expect_and_* not for all backpropagation generally.

It does not, because it is not finished, but it introduces a standard way, consistent with what is there already to set the chunk size of backpropagations. It is trivial then to default to jac_chunk_size instead of chunk_size in QGT and similar things.
Moreover, we can easily have that jac_chunk_size defaults to chunk_size if it is unset.

VMC_SRt is supposed to be a stopgap until someone has the time to merge it back into VMC itself.
I am not in favour of introducing new arguments to it that do not match into the model of VMC itself.
I am in favour of introducing a jac_chunk_size in the variational state rather than in VMC_SRt, following the idea of #1590, and using that in VMC_SRt.

@attila-i-szabo
Copy link
Collaborator Author

attila-i-szabo commented Feb 29, 2024

Ah okay, I thought you had some problem with doing chunking in general.

I don't have time to finish #1590, but it would be important to support chunking in VMC_SRt to some extent. How about we just chunk with vstate.chunk_size for now?

BTW, I think grad_chunk_size is better because we wouldn't always use it to compute a Jacobian.
EDIT: in fact, you might want two different values. For example, the problem with GCNNs (the proximate cause of both this PR and #1347) is that while you can backprop the FFTs at sensible memory cost, it doesn't vmap well for some reason, so the Jacobian takes loads of memory.
But that's beside the point of this PR.

@attila-i-szabo
Copy link
Collaborator Author

VMC_SRt is supposed to be a stopgap until someone has the time to merge it back into VMC itself.

I don't have time for this either, but we should think about whether this is actually feasible/worth it.
The strategy of VMC is (1) compute energy gradients (2) apply some optional precoditioner on them. SRt just doesn't fit into this model. Of course you can always shoehorn two very different kinds of logic into one class, the question is if it's worth it.
One civilised alternative might be some kind of PrecoditionedGradient interface, which takes a vstate and spits out updates. But this too would take a lot of reshuffling...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants