Histplot errorbars #429

meisonlikesicecream · 2023-05-16T12:54:41Z

I find the documentation of the histplot function [0] slightly confusing/contradictive. I stumbled upon this when I wanted to plot a simple weighted 1D histogram with sqrt(w2) errorbars (which I assumed was the default, but apparently not). I have not checked if the following issues are true for histtypes other than histtype=‘errorbar’.

This line about the yerr parameter "Following modes are supported: - True, sqrt(N) errors or poissonian interval when w2 is specified" [1] makes it sound like if yerr == True, then sqrt(N) is used for the errors if w2 is NOT specified, and that the poissonian interval is used if w2 IS specified. However, this is not the case: it always does the poissonian interval even if w2 is None because in the Plottable class definition it always initialises with the method “poisson” [2], which therefore, will always run the poissonian interval [3]. If w2 is specified, and yerr!=None, then it crashes because of [4].

This leads me to another confusion: which error calculation do we want for weighted histograms? The following line about the w2 parameter "Sum of the histogram weights squared for poissonian interval error calculation" [5] makes it sound like we always want to use the poissonian error for weighted histograms, while this line "If w2 has integer values (likely to be data) poisson interval is calculated, otherwise the resulting error is symmetric sqrt(w2)" [6] makes it sound like the poisson interval should only be used for integer values. Again, "sqrt" is never used if w2method is None because of [3], even if w2 has integer values.

Also, if you specify to use the “sqrt” method, it never uses the variances (i.e. w2) for calculating the errors [7]...

I'm happy submit a PR if someone can explain what the intended usage is :-)

The text was updated successfully, but these errors were encountered:

nsmith- · 2023-08-02T13:43:47Z

For sure the sqrt method should be using variances (sumw2), so that's a bug.
As for the rest of the plumbing, there is some attempt to do scaled Poisson in

mplhep/src/mplhep/error_estimation.py

Lines 11 to 31 in 9b47e92

    
           def poisson_interval(sumw, sumw2, coverage=_coverage1sd): 
        
               """Frequentist coverage interval for Poisson-distributed observations 
        
               Parameters 
        
               ---------- 
        
                   sumw : numpy.ndarray 
        
                       Sum of weights vector 
        
                   sumw2 : numpy.ndarray 
        
                       Sum weights squared vector 
        
                   coverage : float, optional 
        
                       Central coverage interval, defaults to 68% 
        
               Calculates the so-called 'Garwood' interval, 
        
               c.f. https://www.ine.pt/revstat/pdf/rs120203.pdf or 
        
               http://ms.mcmaster.ca/peter/s743/poissonalpha.html 
        
               For weighted data, this approximates the observed count by ``sumw**2/sumw2``, which 
        
               effectively scales the unweighted poisson interval by the average weight. 
        
               This may not be the optimal solution: see https://arxiv.org/pdf/1309.1287.pdf for a 
        
               proper treatment. When a bin is zero, the scale of the nearest nonzero bin is 
        
               substituted to scale the nominal upper bound. 
        
               If all bins zero, a warning is generated and interval is set to ``sumw``. 
        
               # Taken from Coffea 
        
               """

for the case where sumw2 is available. So in practice, it will be very close to sqrt(N) for reasonably large N. Nevertheless, one may prefer to be using exactly sqrt(sumw2) the documentation and the code should be synchronized, and I'm sure the maintainers would be happy to receive a PR to correct this.

In the meantime, you can pass a custom error function to mplhep.histplot(..., w2method=mymethod) where

def mymethod(sumw, sumw2):
    return np.array([sumw - np.sqrt(sumw2), sumw + np.sqrt(sumw2)])

tomeichlersmith · 2023-11-28T17:27:57Z

I came across this issue while trying to do another task with the histplot errorbars. Basically, my goal is to not draw the error bars for the bins without any content. Usually, this isn't a huge issue since the bins with content dwarf the bins without content and therefore the error bars on the bins without content do not rise above the x-axis into the figure; however, in the case where I am plotting on a log scale and the bin content traverses many orders of magnitude, the bins without content do show up in the plot.

This problem can be replicated with a short hist creation.

h = hist.Hist.new.Reg(10,0,1).Weight()
h.fill(
    [0., 0.5, 0.6, 0.7, 0.8, 0.9],
    weight = [3., 1e3, 1e4, 1e5, 1e6, 1e7]
)

Now with h defined, we can look at various methods of plotting the error bars.

h.plot()
plt.yscale('log')

We can mask out the error bars by setting the bins with a nan lo poisson estimate to 0.

def poisson_interval_ignore_empty(sumw, sumw2):
    interval = mplhep.error_estimation.poisson_interval(sumw, sumw2)
    lo, hi = interval[0,...], interval[1,...]
    to_ignore = np.isnan(lo)
    lo[to_ignore] = 0.0
    hi[to_ignore] = 0.0
    return np.array([lo,hi])
mplhep.histplot(h.values(), bins=h.axes[0].edges, w2method = poisson_interval_ignore_empty, w2 = h.variances())
plt.yscale('log')

The process of developing this solution actually showed me that the current function does not actually set nan to zero. You should use np.isnan(interval) rather than interval == np.nan in the line below. Maybe this tip is just an addition to the docs or maybe it can included as a named method?

https://github.com/scikit-hep/mplhep/blob/566f8bfa6ae129675a1861613d6955d8a520af5b/src/mplhep/error_estimation.py#L53C1-L53C77

andrzejnovak · 2023-11-29T13:05:54Z

It might make sense as an additional method, but it's maybe not trivial if there is 0 uncertainty for 0 events yield in poison stats cf. https://stats.stackexchange.com/questions/427019/confidence-interval-for-mean-of-poisson-with-only-zero-counts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Histplot errorbars #429

Histplot errorbars #429

meisonlikesicecream commented May 16, 2023

nsmith- commented Aug 2, 2023

tomeichlersmith commented Nov 28, 2023

andrzejnovak commented Nov 29, 2023

Histplot errorbars #429

Histplot errorbars #429

Comments

meisonlikesicecream commented May 16, 2023

nsmith- commented Aug 2, 2023

tomeichlersmith commented Nov 28, 2023

andrzejnovak commented Nov 29, 2023