Distributions are not independent for Empirical #1776

sfo · 2023-12-11T07:38:36Z

The first k dimensions index into a batch of independent distributions

this is not the case for the implementation.
As can be seen from this MWE, the samples for every distribution are always the same:

import numpy as np
import tensorflow_probability as tfp

a = np.random.randint(0, 100, size=1000)
dist = tfp.distributions.Empirical([a, a])
n = 3
dist.sample(n)

# output
# <tf.Tensor: shape=(3, 2), dtype=int32, numpy=
# array([[66, 66],
#        [33, 33],
#        [98, 98]], dtype=int32)>

The root cause seems to be at this location, where indices are sampled only once but re-used for every distribution:

probability/tensorflow_probability/python/distributions/empirical.py

Lines 236 to 238 in fbc5ebe

    
           indices = samplers.uniform([n], maxval=self._compute_num_samples(samples), 
        
                                      dtype=tf.int32, seed=seed) 
        
           draws = tf.gather(samples, indices, axis=self._samples_axis)

I now fear that there might be more issues in the implementation of Empirical, rendering it unusable in contexts where I require independency between distributions.

Also the shaping of the output seems strange, which is (n, k), while I would expect it to be (k, n).

Maybe I also completely misunderstood the documentation.

The text was updated successfully, but these errors were encountered:

csuter · 2023-12-11T15:29:13Z

Wow, yeah, that's a pretty egregious bug. Thank you for flagging. There's pretty good test coverage of this distribution, including, eg, statistical tests of mean and variance, even in the presence of batch shapes. But nothing checks independence of the samples across batches! Should be a straightforward-ish fix (will need to sample indices for each batch dimension and change the gather to a gather_nd, which is always fun :)).

Are you interested in trying to submit a patch? No pressure, just don't want to duplicate work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributions are not independent for Empirical #1776

Distributions are not independent for Empirical #1776

sfo commented Dec 11, 2023

csuter commented Dec 11, 2023

Distributions are not independent for Empirical #1776

Distributions are not independent for Empirical #1776

Comments

sfo commented Dec 11, 2023

csuter commented Dec 11, 2023