New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fBits is not always 4 bytes #878
Comments
We do want to make it as correct as it can be, but I've been bewildered by all of the variation in files that I've come across. It sounds like you have a firm handle on all of the "if/then" logic. Do you see how the code in this part of identify.py, https://github.com/scikit-hep/uproot5/pull/570/files can be modified to get it right? A backward-incompatible change that converts a raises-an-exception case into a working-properly case is not a backward-incompatibility that we're concerned about. Oh, when you say that it depends on raise NotNumerical will get out of the part of the code that is trying to make it uproot5/src/uproot/models/TObject.py Lines 74 to 87 in b53b16f
would have to raise CannotBeStrided to get out of the attempt to interpret it as a fixed-bytesize (strided) object. And then the final fallback of stepping a different number of bytes depending on whether uproot5/src/uproot/models/TObject.py Lines 46 to 51 in b53b16f
So maybe as a first question, before digging too deeply into this: if you construct an A lot of classes descend from |
Yes, that is exactly what I meant.
Actually, I have never used
Sounds great. To be honest, I think that effectively it doesn't matter even in pure python. The complexity changes from O(1) to O(N) though.
I completely agree with this, unfortunately. |
Hi, is there a quick fix or workaround that I could apply to my code (either to tell Thank you very much in advance! Cheers, |
I don't know how to change what Delphes does, but re-reading what I wrote above, it looks like the AsObjects Interpretation does the correct check of For the >>> import uproot, skhep_testdata
>>> tree = uproot.open(skhep_testdata.data_path("uproot-HZZ-objects.root"))["events"] the default interpretations of the >>> tree["jetp4"].interpretation
AsJagged(AsStridedObjects(Model_TLorentzVector_v4), header_bytes=10)
>>> tree["MET"].interpretation
AsStridedObjects(Model_TVector2_v3) We can make AsObjects versions of them by calling uproot.interpretation.identify.interpretation_of with the non-default >>> uproot.interpretation.identify.interpretation_of(tree["jetp4"], {}, simplify=False)
AsObjects(AsVector(True, Model_TLorentzVector))
>>> uproot.interpretation.identify.interpretation_of(tree["MET"], {}, simplify=False)
AsObjects(Model_TVector2) and then pass them to uproot.TBranch.array: >>> branch = tree["jetp4"]
>>> interp = uproot.interpretation.identify.interpretation_of(branch, {}, simplify=False)
>>> branch.array(interp)
<Array [[], [{fP: {...}, ...}], ..., [...], []] type='2421 * var * TLorentz...'>
>>>
>>> branch = tree["MET"]
>>> interp = uproot.interpretation.identify.interpretation_of(branch, {}, simplify=False)
>>> branch.array(interp)
<Array [{fX: 5.91, fY: 2.56}, ..., {...}] type='2421 * TVector2[fX: float64...'> Maybe also set >>> interp._forth = False before using it. The distinction is that AsObjects makes a class that performs a Python for loop over all of your data. Some data structures are complex enough that we have to suffer that factor-of-hundreds slower time. AwkwardForth mitigates that by replacing the Python for loop with a Forth loop, which is faster because it is a more streamlined virtual machine than Python's. AsStridedObjects can be applied in situations in which the data have fixed sizes, so we don't do any loop over the data—Python, Forth, or anything—we just pass it to NumPy's structured array with all of the fixed-size fields encoded in a header. That doesn't work if the size of a field like The reason that this issue has stalled is because I can't think of a good way to check the "If TObject, then uproot5/src/uproot/models/TObject.py Lines 53 to 54 in b53b16f
so maybe it is ready. Since you have files that need the 4-byte, 1-byte roll-over, can you check that they work with both |
Hi Jim, thank you very much for the detailed explanation and suggestions! I tried to apply it, with the following outcome:
It does not mention Concrete steps:
this prints
this prints `
tree=tree.arrays(library="ak")
|
If the interpretation is
Oh, Somehow, you have a TBasket with 58938 bytes in it that Uproot is trying to interpret as either an unsigned 4-byte integer ( You could provisionally set interp = uproot.AsDtype(">u2") # or ">i2" to see what the numbers look like with 2 bytes. If you know something about what the values are supposed to be, just applying an interpretation and checking it against your expectations is a fast way to narrow in on what the actual problem is. Also, since you know this is happening in TBasket number 5, you can look at its number of entries and uncompressed bytes with branch.basket(5).num_entries
branch.basket(5).uncompressed_bytes (uproot.TBranch.basket, uproot.models.TBasket.Model_TBasket.num_entries, and uproot.models.TBasket.Model_TBasket.uncompressed_bytes). For fixed-width data types, like the integer types, you can determine the number of bytes per entry by dividing the number of bytes by the number of entries, and so that provides a cross-check. (If you're ever dealing with an array that has a jagged or an object data type, keep in mind that this doesn't apply!) Hopefully, these will let you narrow in on whether this is a wrong interpretation (i.e. should be 2 or maybe even 6 bytes) or a wrong TBasket (doesn't fit the declared number of entries). Also, since this is a TBranch consisting of nothing but |
Well, it's the usual structure, where the overall branch name is "a int", but it is actually a composite object with various subbranches: see screenshot of when doing `Delphes->Print()" from plain ROOT: I assume (assumed?) uproot would load the whole object as Then maybe we fall back on the last part of your answer:
OK so if uproot doesn't load the whole object as Cheers, |
"Subbranches" are arranged as a nested tree in the metadata, but in the data (what you're having trouble reading), every branch is independent and can be independently read. In uproot.TTree.arrays, you can use tree.arrays(filter_name=lambda x: x != "fBits") |
In the issue, #438, it is reported that in root files Delphes created,
fBits
fields may be 6 bytes and desalinization processes may fail.As an ad-hoc workaround,
fBits
field was decoded as a 1 byte field, but after the issue, #569, the change has reverted, andkBits
type is decoded as the 4 bytes fields with a guess that Delphes does something wrong.However, this is not consistent with how the original root framework works.
In the original framework, in https://github.com/root-project/root/blob/61ea5a002198cd756bdce67e8499feded9dedc0b/io/io/src/TStreamerInfoReadBuffer.cxx#L1021, fields with type
kBits
is decoded as 4 bytes fields only ifkIsReferenced
is not set. If it is set, 2 additional bytes are read and used to getTProcessID
. Then,fUniqueID
on the decoded object is rewritten so thatfUniqueID
is really unique among all processes, I guess. In other words,kBits
type is encoded to 4 or 6 bytes, depending onkIsReferenced
.Although I could not find any document about this behavior, I now believe the original issue, #438, is NOT the bug of Delphes, but the bug of uproot and is to be fixed.
As long as
fUniqueID
is not used across the processes, just skipping 2 extra bytes whenkIsReferenced
should be fine, I guess.For instance, the official JS implementation does in that way:
https://github.com/root-project/root/blob/901c7cc0a986931bd771c0a873247f5708e54111/js/modules/io.mjs#L61
I understand most people including me have never used
fBits
and this simpler way is rather time-consuming, but in principle, uproot should implement this. The best way is probably to adopt some form of lazy evaluation like an iterator, but it would break backward compatibility.The text was updated successfully, but these errors were encountered: