Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes inconsistent values in hist conversion from TProfile read via uproot #1175

Open
raymondEhlers opened this issue Mar 19, 2024 · 2 comments
Labels
bug (unverified) The problem described would be a bug, but needs to be triaged

Comments

@raymondEhlers
Copy link
Contributor

raymondEhlers commented Mar 19, 2024

When I convert TProfile objects to hist that I've read from a ROOT file with uproot, I sometimes observe inconsistent values. For some input files, they're consistent, but for others, they're inconsistent - It depends on the input file. Unfortunately, I haven't figured out how to reproduce this in a simple way, so I've just attached two files which are produced in the same way and exhibit this behavior: (failing.root.txt (11MB), passing.root.txt (< 1 MB)). Both appear to be valid output files beyond this issue. The code below illustrates the behavior:

def test_uproot_profile_consistency_with_hist() -> None:
    import hist
    import numpy as np
    import uproot

    input_file = Path("failing.root")

    # Open and extract with uproot
    with uproot.open(input_file) as f:
        # Cast to a base class. Actual class is here: https://github.com/alisw/AliPhysics/blob/master/PWG/EMCAL/EMCALbase/AliEmcalList.h
        output_list = f["AliAnalysisTaskTrackSkim_pythia"].bases[0]
        x_sec_uproot = next(h for h in output_list if h.name == "fHistXsection")
        x_sec_uproot_hist = x_sec_uproot.to_hist()

    # Cross check with ROOT
    ROOT = pytest.importorskip("ROOT")
    with ROOT.TFile.Open(str(input_file), "READ") as f_ROOT:
        output_list = f_ROOT.Get("AliAnalysisTaskTrackSkim_pythia")
        # Cast to a base class. Actual class is here: https://github.com/alisw/AliPhysics/blob/master/PWG/EMCAL/EMCALbase/AliEmcalList.h
        output_list = ROOT.bind_object(ROOT.addressof(output_list), "TList")

        x_sec_temp = output_list.FindObject("fHistXsection")
        x_sec_temp.SetDirectory(0)
    x_sec_ROOT_values = np.array([x_sec_temp.GetBinContent(i) for i in range(1, x_sec_temp.GetNbinsX() + 1)], dtype=np.float64)

    print(f"Uproot: {uproot.__version__}, hist: {hist.__version__}")

    # The standard uproot values before conversion appear correct
    np.testing.assert_allclose(x_sec_uproot.values(), x_sec_ROOT_values)
    # Fails here for `failing.root`, but fine for `passing.root`
    np.testing.assert_allclose(x_sec_uproot.values(), x_sec_uproot_hist.values())

It strikes as odd that it depends on the particular file when they were generated the same way (with different inputs). I wonder if it's due to some issue with accessing the base class/the cast that I do. It's required because the profile is stored in a TList derived class that's part of our experimental software stack (note that I don't care about any of the additional info stored in the derived class. It's available here if helpful). I'm concerned this is perhaps corrupting an expected memory layout or otherwise causing an issue. Since this class is unfortunately driven by an experiment software stack constraint, it's difficult for me to avoid. If there's a better way for me to workaround this, I'd be happy to use that instead.

Bottom line: I suspect there might be an uproot issue here, but I'm concerned that it's instead due to an experimental software stack quirk. Help here is greatly appreciated - thanks!

>>> import uproot, hist
... print(f"Uproot: {uproot.__version__}, hist: {hist.__version__}")
Uproot: 5.3.1, hist: 2.7.2
@raymondEhlers raymondEhlers added the bug (unverified) The problem described would be a bug, but needs to be triaged label Mar 19, 2024
@jpivarski
Copy link
Member

You were involved in #908, so you know about #1000 (and #1148 is someone else running into it): I'm not convinced that ROOT's TProfile and boost-histogram's WeightedMean Storage are the same thing.

It might be more helpful—you might see a difference between failing.root and passing.root—if you look at the raw quantities on that are stored in the file, the members of the Model_TProfile_v*.

The TH1D base class, which has values (direct bin contents):

self._bases.append(
file.class_named("TH1D", 3).read(
chunk,
cursor,
context,
file,
self._file,
self._parent,
concrete=self.concrete,
)
)

fBinEntries, which is another per-bin quantity:

self._members["fBinEntries"] = file.class_named("TArrayD").read(
chunk, cursor, context, file, self._file, self.concrete
)

and fBinSumw2:

self._members["fBinSumw2"] = file.class_named("TArrayD").read(
chunk, cursor, context, file, self._file, self.concrete
)

The way that the values and errors of the profile are computed from these raw quantities is not at all trivial:

# closely follows the ROOT function, using the same names (with 'root_' prepended)
# https://github.com/root-project/root/blob/ffc7c588ac91aca30e75d356ea971129ee6a836a/hist/hist/src/TProfileHelper.h#L660-L721
def _values_errors_1d(error_mode, fBinEntries, root_cont, fSumw2, fNcells, fBinSumw2):
if error_mode is None or error_mode == _kERRORMEAN or error_mode == "":
error_mode = _kERRORMEAN
elif error_mode == _kERRORSPREAD or error_mode == "s":
error_mode = _kERRORSPREAD
elif error_mode == _kERRORSPREADI or error_mode == "i":
error_mode = _kERRORSPREADI
elif error_mode == _kERRORSPREADG or error_mode == "g":
error_mode = _kERRORSPREADG
else:
raise ValueError(
"error_mode must be None/0/'' for error-on-mean,\n"
" 1/'s' for spread (variance),\n"
" 2/'i' for integer spread (using sqrt(12)),\n"
" or 3/'g' for Gaussian spread\n"
" not "
+ repr(error_mode)
+ "see https://root.cern.ch/doc/master/classTProfile.html"
)
root_sum = fBinEntries
root_sum = numpy.asarray(root_sum, dtype=numpy.float64)
nonzero = root_sum != 0
root_contsum = numpy.zeros(len(root_cont), dtype=numpy.float64)
root_contsum[nonzero] = root_cont[nonzero] / root_sum[nonzero]
if error_mode == _kERRORSPREADG:
out = numpy.zeros(len(root_cont), dtype=numpy.float64)
out[nonzero] = 1.0 / numpy.sqrt(root_sum[nonzero])
return root_contsum, out
root_err2 = fSumw2
if root_err2 is None or len(root_err2) != fNcells:
root_err2 = numpy.zeros(len(root_cont), dtype=numpy.float64)
else:
root_err2 = numpy.asarray(root_err2, dtype=numpy.float64)
root_neff = _effective_counts_1d(fBinEntries, fBinSumw2, fNcells)
root_eprim2 = numpy.zeros(len(root_cont), dtype=numpy.float64)
root_eprim2[nonzero] = abs(
root_err2[nonzero] / root_sum[nonzero] - root_contsum[nonzero] ** 2
)
root_eprim = numpy.sqrt(root_eprim2)
if error_mode == _kERRORSPREADI:
numer = numpy.ones(len(root_cont), dtype=numpy.float64)
denom = numpy.full(len(root_cont), numpy.sqrt(12 * root_neff))
eprim_nonzero = root_eprim != 0
numer[eprim_nonzero] = root_eprim[eprim_nonzero]
denom[eprim_nonzero] = numpy.sqrt(root_neff[eprim_nonzero])
out = numpy.zeros(len(root_cont), dtype=numpy.float64)
out[nonzero] = numer[nonzero] / denom[nonzero]
return root_contsum, out
if error_mode == _kERRORSPREAD:
root_eprim[~nonzero] = 0.0
return root_contsum, root_eprim
out = numpy.zeros(len(root_cont), dtype=numpy.float64)
out[nonzero] = root_eprim[nonzero] / numpy.sqrt(root_neff[nonzero])
return root_contsum, out

@raymondEhlers
Copy link
Contributor Author

raymondEhlers commented Mar 19, 2024

Thanks for the pointers Jim! I had missed #1148 - apologies.

After #1000, I did some validation with multiple files and it seemed to be working on individual TProfiles, so I was a bit surprised that some files work while others don't. However, I now see your point that the calculation is quite involved, which may explain the differences. I'll plan to dig into this further when I have some free moments

@raymondEhlers raymondEhlers changed the title Sometime inconsistent values in hist conversion from TProfile read via uproot Sometimes inconsistent values in hist conversion from TProfile read via uproot Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug (unverified) The problem described would be a bug, but needs to be triaged
Projects
None yet
Development

No branches or pull requests

2 participants