Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a burn in method to vqs #1533

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Add a burn in method to vqs #1533

wants to merge 1 commit into from

Conversation

jwnys
Copy link
Collaborator

@jwnys jwnys commented Jul 20, 2023

Very small PR, but having something like this allows us to lower n_discard a lot. Relevant for @inailuig @gpescia

@codecov
Copy link

codecov bot commented Jul 20, 2023

Codecov Report

Merging #1533 (c7a0665) into master (8dbd9e3) will increase coverage by 1.13%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1533      +/-   ##
==========================================
+ Coverage   83.47%   84.61%   +1.13%     
==========================================
  Files         253      253              
  Lines       14342    14848     +506     
  Branches     2173     2183      +10     
==========================================
+ Hits        11972    12563     +591     
+ Misses       1833     1761      -72     
+ Partials      537      524      -13     
Impacted Files Coverage Δ
netket/vqs/mc/mc_state/state.py 94.86% <100.00%> (+0.27%) ⬆️

... and 69 files with indirect coverage changes

@PhilipVinc
Copy link
Member

but isn't this already achieved by running vs.sample() a few times? Why do we need a method that does it n times explicitly?

@jwnys
Copy link
Collaborator Author

jwnys commented Jul 20, 2023

It's exactly the same yes, just adding it because it's 1 line. It's just that by adding this function explicitly, it would allow us to put the default n_discard_per_chain much lower (say 1 or 2). Too much compute is often spent on the discard part (especially since n_sweeps has to be increased typically, and n_discard will discard n_discard*n_sweep times)

@gpescia
Copy link
Collaborator

gpescia commented Jul 20, 2023

I would even add a keyword argument to the variational state to configure the burn in. The point is that n_discard should be relatively large for the first iteration but low for all others. By adding a burn in keyword nothing has to be done manually and the two things are separated.

Typically useful when a model is loaded from a parameter file.
"""
for _ in range(n):
self.reset()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As documented

Suggested change
self.reset()

@PhilipVinc
Copy link
Member

I don't see how calling vs.burn_in(n=1) is any different than calling vs.sample().
And vs.burn_in(n=N) is just vs.sample(chain_length=vs.chain_length*N).

The only advantage I see is that naming-wise is more discoverable/readable. You are adding a function that can be written in 1 line and you are just saving ~10 characters.

Is it really worth it?

(for the construction keyword argument, it's not a bad idea.)

@PhilipVinc
Copy link
Member

Like, what is the benefit of this new function beyond saving the time to type about 10 characters?

@gpescia
Copy link
Collaborator

gpescia commented Jul 21, 2023

For me the priority is to have a burn in functionality different than n_discard, that is only applied in the first step of the VMC optimization (automatically). I don't have a strong opinion if this is implemented via a function that gets exposed to the user. Though, I think it would be useful as you say for readability. Of course you can just use vs.sample() enough times but exposing a burn_in method would reproduce how people call thermalization in literature.

Copy link
Member

@PhilipVinc PhilipVinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, though I do not agree with this PR because in my opinion it does not really attempt to thermalise the chains, does not explain what it does and is not properly documented and is not MPI-invariant.

In fact, in the limit of many ranks, if a user specified n_samples, this function might will not really be thermalising.

I'd encourage you guys to mock n>1 examples of how it will be used in different use-cases, compare against existing APIs and make a proper case for why this PR is genuinely useful and outweighs the added cost of having an extra method which weights down the API.

My point of view is that a 10-lines function that is relatively opaque does not benefit the project and simply adds entropy. A function to properly thermalise the chains would benefit the project.
I've never had anybody before now complain about this, and rather, I've seen several people sample multiple times, sometimes with a more advanced logic, to warm up the sampler which is exactly what this function does.
I;m open of being convinced of the necessity of this addition, but I'd kindly ask to make your case.

@gpescia
Copy link
Collaborator

gpescia commented Jul 24, 2023

I've never had anybody before now complain about this, and rather, I've seen several people sample multiple times, sometimes with a more advanced logic, to warm up the sampler which is exactly what this function does.
I;m open of being convinced of the necessity of this addition, but I'd kindly ask to make your case.

I think I already pointed out what problem this PR should tackle: n_discard might be large in the first step and small after. So it needs to be reset after the first iteration, which from my point of view can/should be automated.
If people already do their burn in by hand instead of it being automated, I would say this shows that it would be a good addition to have a default for this. You can still use your own logic by turning off the default.

If a default functionality like this is wanted at all, I can try to extend what Jannes started, implementing your review. For example we could just add a keyword n_burn_in to the initialization of the vs and a flag in vs.sample() with burn_in=Bool, deciding whether n_discard or n_burn_in should be used. The VMC could then just start with vs.sample(burn_in=True) and then set the flag to false. Also when manually overwriting parameters, this could also be used reproducing the naming in literature more closely.

If you think the current implementation with n_discard is already good enough (and @jwnys has nothing else to say) you can close the PR again.

@jwnys
Copy link
Collaborator Author

jwnys commented Jul 24, 2023

Sounds good to me as well (though I'd explicitly keep the function I added as well). I would default the burn_in to 0 in the init. Also, I would in the init not take burn_in=bool, but burn_in=int. I should add: this function is something you use in continuous space all the time, not on the lattice necessarily.

@PhilipVinc
Copy link
Member

I think I already pointed out what problem this PR should tackle: n_discard might be large in the first step and small after. So it needs to be reset after the first iteration, which from my point of view can/should be automated.

I perfectly agree with what you are saying here.
I am arguing that this PR is not automating this instead you are simply saving the user from typing 15 characters.
I think there is space for improvement and I'm asking if you can you mock what you would like to have, thinking in particular about how it would compose with MPI and how it would behave with other existing APIs, for example...

The current proposal is this

vs = nk.vmc.MCState(sampler, model, n_samples=1000, n_discard_per_chain = 10)
vs.burn_in(n=3)

...

# change the sampler
vs.sampler = new_sampler
# must burn in again
vs.burn_in(n=3)

# assume we are running under MPI with 10 nodes
vs = nk.vmc.MCState(sampler, model, n_samples=1000, n_discard_per_chain = 10)
vs.burn_in(n=3*n_nodes)

# unless someone is declaring n_samples per rank in which case
vs = nk.vmc.MCState(sampler, model, n_samples_per_rank=1000, n_discard_per_chain = 10)
vs.burn_in(n=3)

Right? The issues I have with this functionality is that (i) it does not compose with MPI. Maybe it's impossible, but I would like it to be considered. (ii) it's just a shorthand for just writing

vs = nk.vmc.MCState(sampler, model, n_samples=1000)
for _ in range(3): vs.sample()

If people already do their burn in by hand instead of it being automated, I would say this shows that it would be a good addition to have a default for this. You can still use your own logic by turning off the default.

Again, I agree with you. I'm simply arguing that this PR is just saving them from typing characters, and not automating something. To pick a value of n consciously they would have to know what the function is doing ,at which point they could write it themselves.

If a default functionality like this is wanted at all

It is. I agree with you that n_discard is too much of a shotgun and one wants to play with a burn in as you are doing here. I simply think that this PR is not useful to this purpose.

For example we could just add a keyword n_burn_in to the initialization of the vs and a flag in vs.sample() with burn_in=Bool, deciding whether n_discard or n_burn_in should be used. The VMC could then just start with vs.sample(burn_in=True)

Yes, this is what I mean by mocking the functionality you would like. Write a block of high level user code (like the ones I wrote above) to show how you would like the API to behave and what you would like it to do (trying to think about edge cases and how it composes with, for example, training multiple times the same variational state, changing the sampler, MPI...)
Once we agree on that, implementing the internals is easier.

@Z-Denis
Copy link
Collaborator

Z-Denis commented Aug 5, 2023

Adding a burn-in method to vqs does not seem to be met with unanimous approval. However, I think this issue should still be tackled in some way. I can imagine how many users must have been relying on the ridiculous default n_discard, thereby wasting a lot of compute time (I know at least one...).

I propose to at least:

  • Add the simple burn-in process we all use (for _ in range(10): vs.sample()) in all examples of NetKet's documentation with some comment about thermalisation of the Markov chains.
  • Choose some more sensible value of n_discard_per_chain. Of course, this depends on the rate of change of the Born distribution after one iteration (and thus on the learning rate, the number of sweeps, whether we are already converged, etc.). It seems improbable to find some non-arbitrary heuristics, I thus propose anything of order 5 to 10.

@PhilipVinc
Copy link
Member

Do note that I'm not against a burn_in method (or equilibration, as deepQMC calls it)

I'm against this particular implementation which in my opinion is not adding anything meaningful and might not work correctly under MPI.

I would be very happy if we could have some automatic burn in method that manages to detect whether the chains are thermalised or not, and in that case stop.
However, I'm not sure if there is an easy or generic way to achieve that (@gcarleo ?)

DeepQMC does it by checking that some scalar criterion goes below a certain threshold.
By default their criterion is the Pairwise self distance between electrons.

For us, we already have the split Rhat which should be able to detect some sort of thermalisation, but we can only compute it given some local observable...

@PhilipVinc
Copy link
Member

As for adding details to the documentation, I'm always in favour.

As for changing the default of n_discard_per_chain, I agree that the current value is an overkill in many situations, but there was some rationale two years ago. I'll link to this past PR #1034 where there was some discussion about it.

@gcarleo has opinions on that as well.

@gcarleo
Copy link
Member

gcarleo commented Aug 5, 2023 via email

@gcarleo
Copy link
Member

gcarleo commented Aug 5, 2023 via email

@PhilipVinc
Copy link
Member

A few years ago you had linked to this page of ALPS which is now offline, with the wayback machine I can access it but not see the figures, which suggests the function was pyalps.checkSteadyState.

The source of this function is here and it seems It does somethign like what I proposed above: given an observable, it checks if some indicator goes below a threshold (it's not based on rhat but on something different).

Maybe we could have it as

class MCState
   def equilibrate(operator, *, max_steps = 100):
        ...

and this will equilibrate according to this operator.

@gcarleo
Copy link
Member

gcarleo commented Aug 5, 2023 via email

@gcarleo
Copy link
Member

gcarleo commented Aug 5, 2023 via email

@PhilipVinc
Copy link
Member

Our default is n_discard = 10% of n_samples (which you proposed yourself, to have a decent discard number even if one has as many chains as samples).

I think moving to n_discard = 20 or so is a good idea (I'd like to keep the default a bit more conservative) provided we document in several places that one needs to 'equilibrate' at the beginning.

@gcarleo
Copy link
Member

gcarleo commented Aug 5, 2023 via email

@Z-Denis
Copy link
Collaborator

Z-Denis commented Aug 5, 2023

For us, we already have the split Rhat which should be able to detect some sort of thermalisation, but we can only compute it given some local observable...

Indeed, one could check that whatever the initial random state each chain was initialised from, the intra-chain samples correspond to somewhat similar statistics. The issue with this is that sometimes one has very few samples per chain (for instance many chains on a GPU), which makes this metric unreliable (for a same burn-in process, Rhat depends on the number of chains).

Something that could work is to run some recursive binning across chains to see how many steps are needed to decorrelate them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants