-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a burn in method to vqs #1533
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1533 +/- ##
==========================================
+ Coverage 83.47% 84.61% +1.13%
==========================================
Files 253 253
Lines 14342 14848 +506
Branches 2173 2183 +10
==========================================
+ Hits 11972 12563 +591
+ Misses 1833 1761 -72
+ Partials 537 524 -13
|
but isn't this already achieved by running |
It's exactly the same yes, just adding it because it's 1 line. It's just that by adding this function explicitly, it would allow us to put the default n_discard_per_chain much lower (say 1 or 2). Too much compute is often spent on the discard part (especially since n_sweeps has to be increased typically, and n_discard will discard n_discard*n_sweep times) |
I would even add a keyword argument to the variational state to configure the burn in. The point is that n_discard should be relatively large for the first iteration but low for all others. By adding a burn in keyword nothing has to be done manually and the two things are separated. |
Typically useful when a model is loaded from a parameter file. | ||
""" | ||
for _ in range(n): | ||
self.reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.reset() |
I don't see how calling The only advantage I see is that naming-wise is more discoverable/readable. You are adding a function that can be written in 1 line and you are just saving Is it really worth it? (for the construction keyword argument, it's not a bad idea.) |
Like, what is the benefit of this new function beyond saving the time to type about 10 characters? |
For me the priority is to have a burn in functionality different than n_discard, that is only applied in the first step of the VMC optimization (automatically). I don't have a strong opinion if this is implemented via a function that gets exposed to the user. Though, I think it would be useful as you say for readability. Of course you can just use vs.sample() enough times but exposing a burn_in method would reproduce how people call thermalization in literature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, though I do not agree with this PR because in my opinion it does not really attempt to thermalise the chains, does not explain what it does and is not properly documented and is not MPI-invariant.
In fact, in the limit of many ranks, if a user specified n_samples
, this function might will not really be thermalising.
I'd encourage you guys to mock n>1 examples of how it will be used in different use-cases, compare against existing APIs and make a proper case for why this PR is genuinely useful and outweighs the added cost of having an extra method which weights down the API.
My point of view is that a 10-lines function that is relatively opaque does not benefit the project and simply adds entropy. A function to properly thermalise the chains would benefit the project.
I've never had anybody before now complain about this, and rather, I've seen several people sample multiple times, sometimes with a more advanced logic, to warm up the sampler which is exactly what this function does.
I;m open of being convinced of the necessity of this addition, but I'd kindly ask to make your case.
I think I already pointed out what problem this PR should tackle: If a default functionality like this is wanted at all, I can try to extend what Jannes started, implementing your review. For example we could just add a keyword If you think the current implementation with n_discard is already good enough (and @jwnys has nothing else to say) you can close the PR again. |
Sounds good to me as well (though I'd explicitly keep the function I added as well). I would default the burn_in to 0 in the init. Also, I would in the init not take burn_in=bool, but burn_in=int. I should add: this function is something you use in continuous space all the time, not on the lattice necessarily. |
I perfectly agree with what you are saying here. The current proposal is this vs = nk.vmc.MCState(sampler, model, n_samples=1000, n_discard_per_chain = 10)
vs.burn_in(n=3)
...
# change the sampler
vs.sampler = new_sampler
# must burn in again
vs.burn_in(n=3)
# assume we are running under MPI with 10 nodes
vs = nk.vmc.MCState(sampler, model, n_samples=1000, n_discard_per_chain = 10)
vs.burn_in(n=3*n_nodes)
# unless someone is declaring n_samples per rank in which case
vs = nk.vmc.MCState(sampler, model, n_samples_per_rank=1000, n_discard_per_chain = 10)
vs.burn_in(n=3) Right? The issues I have with this functionality is that (i) it does not compose with MPI. Maybe it's impossible, but I would like it to be considered. (ii) it's just a shorthand for just writing vs = nk.vmc.MCState(sampler, model, n_samples=1000)
for _ in range(3): vs.sample()
Again, I agree with you. I'm simply arguing that this PR is just saving them from typing characters, and not automating something. To pick a value of
It is. I agree with you that
Yes, this is what I mean by mocking the functionality you would like. Write a block of |
Adding a burn-in method to vqs does not seem to be met with unanimous approval. However, I think this issue should still be tackled in some way. I can imagine how many users must have been relying on the ridiculous default I propose to at least:
|
Do note that I'm not against a I'm against this particular implementation which in my opinion is not adding anything meaningful and might not work correctly under MPI. I would be very happy if we could have some automatic burn in method that manages to detect whether the chains are thermalised or not, and in that case stop. DeepQMC does it by checking that some scalar criterion goes below a certain threshold. For us, we already have the split Rhat which should be able to detect some sort of thermalisation, but we can only compute it given some local observable... |
As for adding details to the documentation, I'm always in favour. As for changing the default of @gcarleo has opinions on that as well. |
Ad hoc criteria based on some specific observable can be used but cannot be
the default in my opinion. One could add something based on the rhat of the
energy but it's completely arbitrary... We could maybe check what Alps did
do detect equilibration (yes, it's a standard term, nothing new) if I
recall correctly there was an automatic way to detect it
…On Sat, Aug 5, 2023, 11:30 Filippo Vicentini ***@***.***> wrote:
Do note that I'm not against a burn_in method (or equilibration, as
deepQMC calls it)
I'm against this particular implementation which in my opinion is not
adding anything meaningful and might not work correctly under MPI.
I would be very happy if we could have some automatic burn in method that
manages to detect whether the chains are thermalised or not, and in that
case stop.
However, I'm not sure if there is an easy or generic way to achieve that (
@gcarleo <https://github.com/gcarleo> ?)
DeepQMC does it
<https://github.com/deepqmc/deepqmc/blob/964321e3ac9e1e38ee2322164cbedfcd304d990f/src/deepqmc/sampling.py#L455>
by checking that some scalar criterion goes below a certain threshold
<https://github.com/deepqmc/deepqmc/blob/964321e3ac9e1e38ee2322164cbedfcd304d990f/src/deepqmc/sampling.py#L483>
.
By default their criterion is the Pairwise self distance
<https://github.com/deepqmc/deepqmc/blob/964321e3ac9e1e38ee2322164cbedfcd304d990f/src/deepqmc/train.py#L247>
between electrons.
For us, we already have the split Rhat which should be able to detect some
sort of thermalisation, but we can only compute it given some local
observable...
—
Reply to this email directly, view it on GitHub
<#1533 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGWYRBB34MYVHWXOC3ODZ43XTYHCXANCNFSM6AAAAAA2RLHNRM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
The discard time cannot depend on how many chains you have, it's an
intrinsic property of the mcmc at hand, so anything that involves
discarding a number of samples that is related to number of total chains is
just wrong!
…On Sat, Aug 5, 2023, 11:34 Filippo Vicentini ***@***.***> wrote:
As for adding details to the documentation, I'm always in favour.
As for changing the default of n_discard_per_chain, I agree that the
current value is an overkill in many situations, but there was some
rationale two years ago. I'll link to this past PR #1034
<#1034> where there was some
discussion about it.
@gcarleo <https://github.com/gcarleo> has opinions on that as well.
—
Reply to this email directly, view it on GitHub
<#1533 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGWYRBFFBWYPA4MZDZKFSBDXTYHSVANCNFSM6AAAAAA2RLHNRM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
A few years ago you had linked to this page of ALPS which is now offline, with the wayback machine I can access it but not see the figures, which suggests the function was The source of this function is here and it seems It does somethign like what I proposed above: given an observable, it checks if some indicator goes below a threshold (it's not based on rhat but on something different). Maybe we could have it as class MCState
def equilibrate(operator, *, max_steps = 100):
... and this will equilibrate according to this operator. |
Yes if it's opt in and you can decide what observable to use, I think it'd
be a good addition indeed!
…On Sat, Aug 5, 2023, 11:46 Filippo Vicentini ***@***.***> wrote:
A few years ago you had linked to this page of ALPS
<http://alps.comp-phys.org/mediawiki/index.php/Documentation:Monte_Carlo_Equilibration>
which is now offline, with the wayback machine I can access it but not
see the figures
<https://web.archive.org/web/20211019194445/https://alps.comp-phys.org/mediawiki/index.php?title=Documentation:Monte_Carlo_Equilibration&oldid=7448>,
which suggests the function was pyalps.checkSteadyState.
The source of this function is here
<https://github.com/qiyang-ustc/ALPS_SIF/blob/4686db8de0eeed462144d7ef30731e91b0cc52fa/alps-2.3.0-src/alps/lib/pyalps/tools.py#L527>
and it seems It does somethign like what I proposed above: given an
observable, it checks if some indicator goes below a threshold (it's not
based on rhat but on something different).
Maybe we could have it as
class MCState
def equilibrate(operator, *, max_steps = 100):
...
and this will equilibrate according to this operator.
—
Reply to this email directly, view it on GitHub
<#1533 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGWYRBH4J3C6KNCZZQ6KBS3XTYJADANCNFSM6AAAAAA2RLHNRM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
If you want, the thing that shouldn't happen (and that maybe it's still
happening in our default?) is to have a number of discarded samples that is
proportional to the total number of samples. In realistic scenarios, it's
enough to take a discard of a few sweeps (not proportional to total number
of samples then)
…On Sat, Aug 5, 2023, 11:47 Giuseppe Carleo ***@***.***> wrote:
Yes if it's opt in and you can decide what observable to use, I think it'd
be a good addition indeed!
On Sat, Aug 5, 2023, 11:46 Filippo Vicentini ***@***.***>
wrote:
> A few years ago you had linked to this page of ALPS
> <http://alps.comp-phys.org/mediawiki/index.php/Documentation:Monte_Carlo_Equilibration>
> which is now offline, with the wayback machine I can access it but not
> see the figures
> <https://web.archive.org/web/20211019194445/https://alps.comp-phys.org/mediawiki/index.php?title=Documentation:Monte_Carlo_Equilibration&oldid=7448>,
> which suggests the function was pyalps.checkSteadyState.
>
> The source of this function is here
> <https://github.com/qiyang-ustc/ALPS_SIF/blob/4686db8de0eeed462144d7ef30731e91b0cc52fa/alps-2.3.0-src/alps/lib/pyalps/tools.py#L527>
> and it seems It does somethign like what I proposed above: given an
> observable, it checks if some indicator goes below a threshold (it's not
> based on rhat but on something different).
>
> Maybe we could have it as
>
> class MCState
> def equilibrate(operator, *, max_steps = 100):
> ...
>
> and this will equilibrate according to this operator.
>
> —
> Reply to this email directly, view it on GitHub
> <#1533 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AGWYRBH4J3C6KNCZZQ6KBS3XTYJADANCNFSM6AAAAAA2RLHNRM>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
Our default is I think moving to |
Yes agreed, the ten percent is too conservative for most use cases, I think
we should change it at this point. 20 seems reasonable to me... I'm the vmc
driver one could add a flag (defaulted to false) to avoid discarding
samples after the first optimization step, maybe. It's wrong, but it's fast
and asymptotically it shouldn't matter too much when you get to the ground
state
…On Sat, Aug 5, 2023, 13:06 Filippo Vicentini ***@***.***> wrote:
Our default is n_discard = 10% of n_samples (which you proposed yourself,
to have a decent discard number even if one has as many chains as samples).
I think moving to n_discard = 20 or so is a good idea (I'd like to keep
the default a bit more conservative) provided we document in several places
that one needs to 'equilibrate' at the beginning.
—
Reply to this email directly, view it on GitHub
<#1533 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGWYRBHBTPFKHS2TJRWZTGLXTYSK3ANCNFSM6AAAAAA2RLHNRM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Indeed, one could check that whatever the initial random state each chain was initialised from, the intra-chain samples correspond to somewhat similar statistics. The issue with this is that sometimes one has very few samples per chain (for instance many chains on a GPU), which makes this metric unreliable (for a same burn-in process, Rhat depends on the number of chains). Something that could work is to run some recursive binning across chains to see how many steps are needed to decorrelate them. |
562101f
to
5d8bad1
Compare
Very small PR, but having something like this allows us to lower n_discard a lot. Relevant for @inailuig @gpescia