Trying to use Blobs with the vectorize=True keyword returns a ValueError("incompatible input dimensions") #516

Orient-Dorey · 2024-04-17T18:24:05Z

General information:

emcee version: 3.1.5.dev33+gd4c2de9
platform: Ubuntu 22.04.4 LTS
installation method (pip/conda/source/other?): source

Problem description:

Hello!
I am using emcee to fit a complex TensorFlow model to observed fluxes. As TensorFlow is, by essence, highly parallel, I chose to use the vectorize keyword when calling the sampler, so that the walkers would be computed together all at once by the model, instead of creating one instance per walker. This works fine so far, however an issue arises when I try to include blobs to the output of my lnprob function, as shown in the first example of the Blobs documentation. Indeed, this immediately returns a ValueError("incompatible input dimensions") which is raised by these lines in the ensemble.py source code (lines 345 & 346):

if np.shape(state.log_prob) != (self.nwalkers,):
    raise ValueError("incompatible input dimensions")

Upon further investigation, I have found that the shapes of the log_prob and blob in the compute_log_prob method were not as expected. If my lnprob function returns a tuple with shape (2, 32) (with 32 walkers), I expected the outputs of the compute_log_prob method (log_prob and blob) to each have shape (32,). However, when printing these shapes, they are respectively (2,) and (2, 31).

Expected behavior:

I expected the log_prob and blob objects returned by the compute_log_prob method to each have shape (nwalkers,), and the code to run as normal.

Running production...
100%|████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 127.01it/s]
Samples shape (10, 32, 5)
Blobs shape (10, 32)

Actual behavior:

The code returns a ValueError("incompatible input dimensions"), and the log_prob and blob objects have respective shapes (2,) and (2, nwalkers - 1).

Running production...
Traceback (most recent call last):
  File "/media/orient/DATA/Code/issue_mcmc.py", line 44, in <module>
    sampler.run_mcmc(initial, niter, progress=True)
  File "/home/orient/.local/lib/python3.10/site-packages/emcee/src/emcee/ensemble.py", line 443, in run_mcmc
    for results in self.sample(initial_state, iterations=nsteps, **kwargs):
  File "/home/orient/.local/lib/python3.10/site-packages/emcee/src/emcee/ensemble.py", line 346, in sample
    raise ValueError("incompatible input dimensions")
ValueError: incompatible input dimensions

What have you tried so far?:

The only solution I have found so far is to edit slightly the ensemble.py source code, adding a basic condition to check if the vectorize keyword is used. It is a bit inelegant though, and doesn't account for using multiple blobs (returning a tuple of 3 values instead of the typical 2: lnprob and blob, for instance). Here are the added lines, from line 491 onwards:

        try:
            # perhaps log_prob_fn returns blobs?

            # deal with the blobs first
            # if l does not have a len attribute (i.e. not a sequence, no blob)
            # then a TypeError is raised. However, no error will be raised if
            # l is a length-1 array, np.array([1.234]). In that case blob
            # will become an empty list.
            if self.vectorize: # ADDED LINE HERE
                log_prob, blob = results # ADDED LINE HERE
            else: # ADDED LINE HERE
                blob = [l[1:] for l in results if len(l) > 1]
                if not len(blob):
                    raise IndexError
                log_prob = np.array([_scalar(l[0]) for l in results])
        except (IndexError, TypeError, ValueError): # ValueError ADDED HERE
            log_prob = np.array([_scalar(l) for l in results])
            blob = None

The added ValueError allows the code to run as expected when not using any blobs.

Minimal example:

import emcee
import numpy as np
np.random.seed(1337)

def lnprob(theta, y, yerr):
    # theta has shape (nwalkers, ndims)
    model = np.sum(theta, axis=1)  # shape nwalkers
    like = -0.5 * ((y - model) / yerr) ** 2
    return like, model  # like is the result log prob, model goes into the blobs

# Basic 'observed' data
true_obs = 0.7
obs_err = 0.03 * true_obs
data = (true_obs, obs_err)

initial = np.random.rand(32, 5)
nwalkers, ndim = initial.shape
niter = 10

sampler = emcee.EnsembleSampler(
    nwalkers,
    ndim,
    lnprob,
    vectorize=True,
    moves=emcee.moves.WalkMove(),
    args=data,
)
print("Running production...")
sampler.run_mcmc(initial, niter, progress=True)


samples = sampler.get_chain()  
properties = sampler.get_blobs()

print(samples.shape)
print(properties.shape)

I'm new to github, sorry if the issue isn't short enough / detailed enough.
Thank you for your time!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trying to use Blobs with the vectorize=True keyword returns a ValueError("incompatible input dimensions") #516

Trying to use Blobs with the vectorize=True keyword returns a ValueError("incompatible input dimensions") #516

Orient-Dorey commented Apr 17, 2024 •

edited

Trying to use Blobs with the vectorize=True keyword returns a ValueError("incompatible input dimensions") #516

Trying to use Blobs with the vectorize=True keyword returns a ValueError("incompatible input dimensions") #516

Comments

Orient-Dorey commented Apr 17, 2024 • edited

Expected behavior:

Actual behavior:

What have you tried so far?:

Minimal example:

Orient-Dorey commented Apr 17, 2024 •

edited