Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elaborate NS run using a loop #644

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Elaborate NS run using a loop #644

wants to merge 4 commits into from

Conversation

fnovak42
Copy link
Contributor

@fnovak42 fnovak42 commented Jun 7, 2023

Use a loop to run a nested sampler internally as outlined here: https://dynesty.readthedocs.io/en/stable/dynamic.html
This is related to one or several points from issue #573.

sampler.run_nested(dlogz_init=dlogz, maxiter=maxiter, print_progress=print_progress)
#sampler.run_nested(dlogz_init=dlogz, maxiter=maxiter)

for results in sampler.sample_initial(dlogz=dlogz, maxiter=maxiter):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would enumerate this and run a log message every few seconds/every few samples. @mreboud ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every few samples is probably simpler to implement.

@@ -607,7 +607,7 @@ def _prior_transform(self, u):
return self._u_to_par(u)


def sample_nested(self, bound='multi', nlive=250, dlogz=1.0, maxiter=None, seed=10, print_progress=True):
def sample_nested(self, bound='multi', nlive=250, dlogz=1.0, maxiter=None, seed=10, print_progress=True, save_intermediate=False, base_directory='./', posterior=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a reasonable idea to begin with, but I do not like it very much. So far, the design was to separate the tasks from the lower-level operations. I think we should and can keep doing that.

I would suggest to use a yield_intermediate keyword argument. Instead of using return sampler.results once at the end, we rather yield sampler.results periodically. The task can then run a for loop over this function and save the results itself.

@fnovak42
Copy link
Contributor Author

The main change in the implementation right now is that the intermediate results are stored every few loop, specified by checkpoint_interval. I don't think that storing them every few samples would be practical, since the results of several batches (with differing numbers of samples) are combined within an iteration. It would be easier, and perhaps more useful, to save the results every few seconds using dynesty.utils.DelayTimer -- this is also used in dynesty for the checkpoint_every parameter in run_nested.

@fnovak42
Copy link
Contributor Author

fnovak42 commented Sep 5, 2023

To tick some of the boxes in issue #573 off:

  • the log file now contains the messages generated by eos.info and is placed in the nested directory as discussed. So I would tick the third box off
  • the fourth point is also done; we've replaced the loop
  • the log file will now tell you if sampling was stopped due the maximum number of iterations or the dlogz limit being reached

Regarding the last point, currently we use the same variables maxiter and dlogz in the initial sampling (line 641 in analysis.py) and the batches (line 667) as well as the overall loop (lines 656, 661). Is it feasible to use different variables for those or should they adhere to the same limit?

I also thought about adding something like this if maxiter is not given:

import sys
if maxiter is None: maxiter = sys.maxsize

Copy link
Member

@dvandyk dvandyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very reasonable! Some comments inline.

for p in self.varied_parameters:
eos.error(' - {n}: {v}'.format(n=p.name(), v=p.evaluate()))
eos.error(f' - {p.name()}: {p.evaluate()}')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the changes above are due to pyupgrade and should be moved to a separate commit.

@@ -653,14 +653,55 @@ def sample_nested(self, bound='multi', nlive=250, dlogz=1.0, maxiter=None, seed=
:type maxiter: int, optional
:param seed: The seed used to initialize the Mersenne Twister pseudo-random number generator.
:type seed: {None, int, array_like[ints], SeedSequence}, optional
:param save_intermediate: If set to True, the intemediate dynesty sampler results are stored in each loop iteration.
:type save_intermediate: bool, optional
:param checkpoint_every: The number of seconds between checkpoints at which the intermediate dynesty sampler results are stored.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the checkpointing. Not so sure that I like the naming.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about backup and backup_frequency?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just a checkpoint_interval (float, larger zero)? If None, no intermediate results are saved.

sampler = dynesty.DynamicNestedSampler(self.log_likelihood, self._prior_transform, len(self.varied_parameters), bound=bound, nlive=nlive, rstate = np.random.Generator(np.random.MT19937(seed)))
sampler.run_nested(dlogz_init=dlogz, maxiter=maxiter, print_progress=print_progress)
return sampler.results
#sampler.run_nested(dlogz_init=dlogz, maxiter=maxiter)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment needs to be removed.

posterior_values = results.logwt - results.logz[-1]
weights = _np.exp(posterior_values)
eos.data.DynestyResults.create(os.path.join(base_directory, posterior, f'dynesty_results-{iter:04}'), analysis.varied_parameters, results)
eos.data.ImportanceSamples.create(os.path.join(base_directory, posterior, f'samples-{iter:04}'), analysis.varied_parameters,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the labelling of the dynesty result by the iteration number. However, doing so for the samples will break backward compatibility with e.g. the plotting framework. Please undo this labelling for samples.

@fnovak42
Copy link
Contributor Author

fnovak42 commented Nov 6, 2023

For comparison I calculated the chi**2 values from test_multivariate_priors_1 in python/eos/analysis_TEST.py. The values using the code from the main branch (unrolled loop) are:

  • chi 1: 0.002682592060968833
  • chi 2: 5.803923701460194e-07
  • chi 3: 2.52241831039844e-05
  • chi 4: 0.2764884467520001

and using this branch:

  • chi 1: 0.017353753382656843
  • chi 2: 0.00031026173335609383
  • chi 3: 0.0009632358001228368
  • chi 4: 0.5324848524622118

I haven't yet spotted a significant difference to the dynesty version that would tell me where the difference comes from: https://github.com/joshspeagle/dynesty/blob/f59d963fad80301e5d28bc5a6e3718c467c1bf94/py/dynesty/dynamicsampler.py#L1824

Copy link
Contributor

@mreboud mreboud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clean up line 176 of analysis_TEST.py? There is a missing sqrt, but sigma is not used anyway ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants