Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing seeds and reproducibility #87

Open
nchopin opened this issue May 5, 2024 Discussed in #86 · 0 comments
Open

Fixing seeds and reproducibility #87

nchopin opened this issue May 5, 2024 Discussed in #86 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@nchopin
Copy link
Owner

nchopin commented May 5, 2024

Discussed in #86

Originally posted by MauroCE May 4, 2024
I really like this package and it's very intuitive to use :) One thing that would make me love it even more would be to allow the user to provide/fix the random state so that results can be exactly reproduced. I am still getting familiar with the abstractions and inner workings of the package, but I was thinking something like this could work:

  1. Defining a function in utils.py that generates an instance of np.random.Generator given either a seed or another rng
def setup_rng(seed, rng):
    assert (seed is None) or (rng is None), "At least one of `seed` or `rng` must be None."
    if rng is None:
        if seed is None:
            seed = np.random.randint(low=1000, high=9999)
        rng = np.random.default_rng(seed=seed)
    return rng
  1. Add reproducibility for resampling schemes by modyfing rs_doc, resampling_scheme, resampling. Mostly this requires adding an extra argument rng=None to the various resampling functions (as well as uniform_spacings and the queue). E.g.
from particles.utils import setup_rng 
rs_doc = """\

    Parameters
    ----------
    W : (N,) ndarray
        normalized weights (>=0, sum to one)
    M : int, optional (set to N if missing)
        number of resampled points.
    rng : np.random.Generator, optional (instantiated at random if missing)
        random number generator for reproducibility

    Returns
    -------
    (M,) ndarray
     M ancestor variables, drawn from range 0, ..., N-1
"""

def resampling_scheme(func):
    """Decorator for resampling schemes."""

    @functools.wraps(func)
    def modif_func(W, M=None, rng=None):
        M = W.shape[0] if M is None else M
        return func(W, M, rng=rng)

    rs_funcs[func.__name__] = modif_func
    modif_func.__doc__ = func.__doc__ + rs_doc
    return modif_func

def resampling(scheme, W, M=None, rng=None):
    rng = setup_rng(seed=None, rng=rng)
    try:
        return rs_funcs[scheme](W, M=M, rng=rng)
    except KeyError:
        raise ValueError(f"{scheme} is not a valid resampling scheme")
        
@resampling_scheme
def systematic(W, M, rng=None):
    """Systematic resampling."""
    su = (rng.uniform(low=0.0, high=1, size=1) + np.arange(M)) / M
    return inverse_cdf(su, W)
  1. Add rng=None to the arguments of ArrayMCMC and store it as an attribute. Then ArrayMetropolis, ArrayRandomWalk etc can use it within the step() and proposal methods respectively. When one instantiates a MCMCSequence one would pass an optional rng. In the same way, rng is passed down from the FKSMCsampler, which gets handed it down from e.g. Tempering or AdaptiveTempering.

The same thing can be done for FeynmanKac and SMC. I tried it on my machine and it seems to work smoothly. One could probably work directly with rng and the argument seed in setup_rng would be redundant. I find that, depending on what I am trying to do and compare results with, I sometimes need to pass a seed or an rng, so could be an option to leave both.

Let me know what you think :)

@nchopin nchopin self-assigned this May 5, 2024
@nchopin nchopin added the enhancement New feature or request label May 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant