Jacobian chunking in `VMC_SRt` #1740

attila-i-szabo · 2024-02-29T13:05:25Z

Some neural networks require much more memory for backpropagation, which makes it useful to allow chunking (with smaller chunks) when computing the Jacobian. (cf. #1347, #1590)

This PR therefore adds Jacobian chunking to VMC_SRt. IIUC all this takes is to add an extra argument to nkjax.jacobian and copying some logic from MCState to set safe chunk sizes.

Outstanding questions:

if we don't specify a chunk size explicitly, should VMC_SRt take the vstate's chunk size (like the QGT objects) or not chunk at all?
what tests are needed?

attila-i-szabo · 2024-02-29T13:07:12Z

cc @riccardo-rende @llviteritti

PhilipVinc · 2024-02-29T13:20:51Z

What is wrong with #1590? It works well modulo a typo in estate.expect_and_grad.
The only reason I have not merged the PR is because tests are missing.

(On a side note, I was thinking just now that maybe a better name would be jac_chunk_size instead of grad_chunk_size, but that is minor)

SRt should eventually get integrated into standard SR if we find a clean way to do it, so I'm a bit against doing this. I'd rather this be a general feature that benefits everyone. Moreover this is a bit confusing, because until now we keep the chunking in the variational state, while with this we'd have it in two different places.

I would be much happier if this was based of #1590, and vstate.jac_chunk_size was used as the chunk size if necessary.

codecov · 2024-02-29T13:27:07Z

Codecov Report

Attention: Patch coverage is 45.45455% with 12 lines in your changes are missing coverage. Please review.

Project coverage is 82.68%. Comparing base (c841005) to head (2115a70).

Files	Patch %	Lines
netket/experimental/driver/vmc_srt.py	45.45%	11 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1740      +/-   ##
==========================================
- Coverage   82.73%   82.68%   -0.05%     
==========================================
  Files         299      299              
  Lines       18271    18293      +22     
  Branches     3480     3487       +7     
==========================================
+ Hits        15116    15126      +10     
- Misses       2476     2487      +11     
- Partials      679      680       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

attila-i-szabo · 2024-02-29T13:29:49Z

What is wrong with #1590?

It's not merged. Also, it only implements a different backprop chunk size for expect_and_* not for all backpropagation generally. I.e. it wouldn't solve this problem (AFAICT VMC_SRt doesn't even use expect_and_grad)

It works well modulo a typo in estate.expect_and_grad. The only reason I have not merged the PR is because tests are missing.

(On a side note, I was thinking just now that maybe a better name would be jac_chunk_size instead of grad_chunk_size, but that is minor)

I would be much happier if this was based of #1590, and vstate.jac_chunk_size was used as the chunk size if necessary.

You have to discuss with @chrisrothUT

SRt should eventually get integrated into standard SR if we find a clean way to do it, so I'm a bit against doing this. I'd rather this be a general feature that benefits everyone.

That would be nice, but in the meanwhile it doesn't hurt to make SRt more usable. I'm also a little skeptical whether the two workflows can be merged nicely (I guess that's partly why it hasn't been done yet).

Moreover this is a bit confusing, because until now we keep the chunking in the variational state, while with this we'd have it in two different places.

No, QGTJacobian also does its own chunking (i.e., calls nkjax.jacobian with a chunk_size argument, like my proposal here)

PhilipVinc · 2024-02-29T14:23:24Z

No, QGTJacobian also does its own chunking (i.e., calls nkjax.jacobian with a chunk_size argument, like my proposal here)

Yes and no. QGTJacobian defaults to the chunk size of the variational state if it is there, even if it is possible to override it.

It's not merged. Also, it only implements a different backprop chunk size for expect_and_* not for all backpropagation generally.

It does not, because it is not finished, but it introduces a standard way, consistent with what is there already to set the chunk size of backpropagations. It is trivial then to default to jac_chunk_size instead of chunk_size in QGT and similar things.
Moreover, we can easily have that jac_chunk_size defaults to chunk_size if it is unset.

VMC_SRt is supposed to be a stopgap until someone has the time to merge it back into VMC itself.
I am not in favour of introducing new arguments to it that do not match into the model of VMC itself.
I am in favour of introducing a jac_chunk_size in the variational state rather than in VMC_SRt, following the idea of #1590, and using that in VMC_SRt.

attila-i-szabo · 2024-02-29T15:23:52Z

Ah okay, I thought you had some problem with doing chunking in general.

I don't have time to finish #1590, but it would be important to support chunking in VMC_SRt to some extent. How about we just chunk with vstate.chunk_size for now?

BTW, I think grad_chunk_size is better because we wouldn't always use it to compute a Jacobian.
EDIT: in fact, you might want two different values. For example, the problem with GCNNs (the proximate cause of both this PR and #1347) is that while you can backprop the FFTs at sensible memory cost, it doesn't vmap well for some reason, so the Jacobian takes loads of memory.
But that's beside the point of this PR.

attila-i-szabo · 2024-02-29T15:35:52Z

VMC_SRt is supposed to be a stopgap until someone has the time to merge it back into VMC itself.

I don't have time for this either, but we should think about whether this is actually feasible/worth it.
The strategy of VMC is (1) compute energy gradients (2) apply some optional precoditioner on them. SRt just doesn't fit into this model. Of course you can always shoehorn two very different kinds of logic into one class, the question is if it's worth it.
One civilised alternative might be some kind of PrecoditionedGradient interface, which takes a vstate and spits out updates. But this too would take a lot of reshuffling...

Jacobian chunking in VMC_SRt

2115a70

attila-i-szabo requested a review from PhilipVinc February 29, 2024 13:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jacobian chunking in `VMC_SRt` #1740

Jacobian chunking in `VMC_SRt` #1740

attila-i-szabo commented Feb 29, 2024

attila-i-szabo commented Feb 29, 2024

PhilipVinc commented Feb 29, 2024

codecov bot commented Feb 29, 2024 •

edited

attila-i-szabo commented Feb 29, 2024 •

edited

PhilipVinc commented Feb 29, 2024

attila-i-szabo commented Feb 29, 2024 •

edited

attila-i-szabo commented Feb 29, 2024

Jacobian chunking in VMC_SRt #1740

Are you sure you want to change the base?

Jacobian chunking in VMC_SRt #1740

Conversation

attila-i-szabo commented Feb 29, 2024

attila-i-szabo commented Feb 29, 2024

PhilipVinc commented Feb 29, 2024

codecov bot commented Feb 29, 2024 • edited

Codecov Report

attila-i-szabo commented Feb 29, 2024 • edited

PhilipVinc commented Feb 29, 2024

attila-i-szabo commented Feb 29, 2024 • edited

attila-i-szabo commented Feb 29, 2024

Jacobian chunking in `VMC_SRt` #1740

Jacobian chunking in `VMC_SRt` #1740

codecov bot commented Feb 29, 2024 •

edited

attila-i-szabo commented Feb 29, 2024 •

edited

attila-i-szabo commented Feb 29, 2024 •

edited