Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to use Blobs with the vectorize=True keyword returns a ValueError("incompatible input dimensions") #516

Open
Orient-Dorey opened this issue Apr 17, 2024 · 0 comments

Comments

@Orient-Dorey
Copy link

Orient-Dorey commented Apr 17, 2024

General information:

  • emcee version: 3.1.5.dev33+gd4c2de9
  • platform: Ubuntu 22.04.4 LTS
  • installation method (pip/conda/source/other?): source

Problem description:

Hello!
I am using emcee to fit a complex TensorFlow model to observed fluxes. As TensorFlow is, by essence, highly parallel, I chose to use the vectorize keyword when calling the sampler, so that the walkers would be computed together all at once by the model, instead of creating one instance per walker. This works fine so far, however an issue arises when I try to include blobs to the output of my lnprob function, as shown in the first example of the Blobs documentation. Indeed, this immediately returns a ValueError("incompatible input dimensions") which is raised by these lines in the ensemble.py source code (lines 345 & 346):

if np.shape(state.log_prob) != (self.nwalkers,):
    raise ValueError("incompatible input dimensions")

Upon further investigation, I have found that the shapes of the log_prob and blob in the compute_log_prob method were not as expected. If my lnprob function returns a tuple with shape (2, 32) (with 32 walkers), I expected the outputs of the compute_log_prob method (log_prob and blob) to each have shape (32,). However, when printing these shapes, they are respectively (2,) and (2, 31).

Expected behavior:

I expected the log_prob and blob objects returned by the compute_log_prob method to each have shape (nwalkers,), and the code to run as normal.

Running production...
100%|████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 127.01it/s]
Samples shape (10, 32, 5)
Blobs shape (10, 32)

Actual behavior:

The code returns a ValueError("incompatible input dimensions"), and the log_prob and blob objects have respective shapes (2,) and (2, nwalkers - 1).

Running production...
Traceback (most recent call last):
  File "/media/orient/DATA/Code/issue_mcmc.py", line 44, in <module>
    sampler.run_mcmc(initial, niter, progress=True)
  File "/home/orient/.local/lib/python3.10/site-packages/emcee/src/emcee/ensemble.py", line 443, in run_mcmc
    for results in self.sample(initial_state, iterations=nsteps, **kwargs):
  File "/home/orient/.local/lib/python3.10/site-packages/emcee/src/emcee/ensemble.py", line 346, in sample
    raise ValueError("incompatible input dimensions")
ValueError: incompatible input dimensions

What have you tried so far?:

The only solution I have found so far is to edit slightly the ensemble.py source code, adding a basic condition to check if the vectorize keyword is used. It is a bit inelegant though, and doesn't account for using multiple blobs (returning a tuple of 3 values instead of the typical 2: lnprob and blob, for instance). Here are the added lines, from line 491 onwards:

        try:
            # perhaps log_prob_fn returns blobs?

            # deal with the blobs first
            # if l does not have a len attribute (i.e. not a sequence, no blob)
            # then a TypeError is raised. However, no error will be raised if
            # l is a length-1 array, np.array([1.234]). In that case blob
            # will become an empty list.
            if self.vectorize: # ADDED LINE HERE
                log_prob, blob = results # ADDED LINE HERE
            else: # ADDED LINE HERE
                blob = [l[1:] for l in results if len(l) > 1]
                if not len(blob):
                    raise IndexError
                log_prob = np.array([_scalar(l[0]) for l in results])
        except (IndexError, TypeError, ValueError): # ValueError ADDED HERE
            log_prob = np.array([_scalar(l) for l in results])
            blob = None

The added ValueError allows the code to run as expected when not using any blobs.

Minimal example:

import emcee
import numpy as np
np.random.seed(1337)

def lnprob(theta, y, yerr):
    # theta has shape (nwalkers, ndims)
    model = np.sum(theta, axis=1)  # shape nwalkers
    like = -0.5 * ((y - model) / yerr) ** 2
    return like, model  # like is the result log prob, model goes into the blobs

# Basic 'observed' data
true_obs = 0.7
obs_err = 0.03 * true_obs
data = (true_obs, obs_err)

initial = np.random.rand(32, 5)
nwalkers, ndim = initial.shape
niter = 10

sampler = emcee.EnsembleSampler(
    nwalkers,
    ndim,
    lnprob,
    vectorize=True,
    moves=emcee.moves.WalkMove(),
    args=data,
)
print("Running production...")
sampler.run_mcmc(initial, niter, progress=True)


samples = sampler.get_chain()  
properties = sampler.get_blobs()

print(samples.shape)
print(properties.shape)

I'm new to github, sorry if the issue isn't short enough / detailed enough.
Thank you for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant