Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probability function returned NaN #415

Open
omaarattia opened this issue Dec 13, 2021 · 2 comments
Open

Probability function returned NaN #415

omaarattia opened this issue Dec 13, 2021 · 2 comments

Comments

@omaarattia
Copy link

omaarattia commented Dec 13, 2021

Good morning,

I have an issue on emcee. The EnsembleSampler crashes after a few iterations and raises a ValueError: Probability function returned NaN. The thing is my user-defined probability function cannot return NaNs, as it defined something like this:

def my_probability_function(x):
    prob = logP(*x)
    if prob > 0 or np.isnan(prob): return -np.inf
    return prob

where logP is basically an evaluation of a beforehand-constructed RBF interpolator.

I call the EnsembleSampler like this:

import emcee
import numpy as np
from pathos.pools import ProcessPool

NW = 64 #Number of workers
ND = 7 #Number of dimensions
MCMC_PATH = 'path/to/backend.h5'
backend = emcee.backends.HDFBackend(MCMC_PATH)
backend.reset(NW, ND)
with ProcessPool(nodes=8) as pool:
    pool.restart()
    sampler = emcee.EnsembleSampler(NW, ND, my_probability_function, pool=pool, backend=backend)
    sampler.run_mcmc(p0, 5000, progress=True)

Changing the logP function to e.g. a nearest interpolator solves the problem, so I guess the issue comes from a bad interaction between emcee and scipy's RBF interpolator?

I tried advancing the MCMC iteration by iteration and comparing at each step the manual and emcee-stored evaluation of the probability function:

sampler.run_mcmc(pp, 1, skip_initial_state_check=True)
pp = sampler.get_chain(flat=True)[-NW:]

logP_manual = [my_probability_function(p) for p in pp]
logP_emcee, _ = sampler.compute_log_prob(pp)

diff = np.abs(1. - logP_manual/logP_emcee) 

and at some point, diff was not identically zero, meaning that my_probability_function sometimes returns different results depending on if one evaluates it outside or inside emcee!

Can you help me please? Unfortunately, I was not able to replicate the problem in a minimalist example, as it only emerges when the RBF interpolator is constructed on my personal 2M data points in 7 dimensions.

Many thanks in advance.

@dfm
Copy link
Owner

dfm commented Dec 13, 2021

I'm not too sure what to say here. There isn't any real reason why emcee would cause different behavior with your interpolator - it just directly calls the function you provide:

results = list(map_func(self.log_prob_fn, p))

Some thoughts:

  1. There isn't any strong reason why a probability density needs to be less than 1, so you probably don't want the prob > 0 check in your function.
  2. The only processing that emcee does with the results of your function call is to pass it though a Python float, so perhaps there's some numerical instability there, but if there is that suggests that there's probably something else wrong with your function. You might want to debug what's happening there?
  3. Do you find the same behavior if you don't use multiprocessing? Perhaps there's an issue with multiprocessing and your interpolator.

@omaarattia
Copy link
Author

Thank you very much for your answer. I'll try your suggestions and let you know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants