Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ElementLink branch EventInfoAuxDyn.hardScatterVertexLink in PHYSLITE can't be read #1177

Open
nikoladze opened this issue Mar 19, 2024 · 3 comments
Labels
bug (unverified) The problem described would be a bug, but needs to be triaged

Comments

@nikoladze
Copy link
Contributor

nikoladze commented Mar 19, 2024

There is one remaining issue from #951 - that is a particular ElementLink branch in PHYSLITE files that is not in a vector can't be read at the moment, although the way the data looks like it should be in scope what is supported by uproot:

>>> import uproot
>>> uproot.__version__
5.3.2.dev10+g21735bf.d20240319

The issue occurs in the EventInfoAuxDyn.hardScatterVertexLink branch (it has a subbranch with the same name)

>>> import uproot
>>> from skhep_testdata import data_path
>>> filename = data_path("uproot-issue-951.root")
>>> tree = uproot.open({filename: "CollectionTree"})
>>> tree["EventInfoAuxDyn.hardScatterVertexLink/EventInfoAuxDyn.hardScatterVertexLink"].interpretation
<UnknownInterpretation 'none of the rules matched'>

Reading it then fails with UnknownInterpretation. The data looks not too strange

>>> tree["EventInfoAuxDyn.hardScatterVertexLink/EventInfoAuxDyn.hardScatterVertexLink"].debug(0, skip_bytes=10, dtype=">i4")
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
 55 209  69 151   0   0   0   0
  7 ---   E --- --- --- --- ---
      936461719               0

We can see the key and index fields - matching what we have in all the other ElementLinks that inherit from ElementLinkBase:

>>> [(el.name, el.typename) for el in tree._file.streamer_named("ElementLinkBase").elements]
[('m_persKey', 'unsigned int'), ('m_persIndex', 'unsigned int')]

Unfortunately there seems to be no streamer for ElementLink with HardScatterVertices as template argument, maybe that's the issue? Now seeing that i also checked with ROOT and it indeed also doesn't show a type for this branch with TTree::Print:

>>> import ROOT
>>> root_file = ROOT.TFile.Open(filename)
>>> root_tree = root_file.Get("CollectionTree")
>>> root_tree.Print("*EventInfoAuxDyn.hardScatter*")
******************************************************************************
*Tree    :CollectionTree: CollectionTree                                         *
*Entries :       50 : Total =         5672587 bytes  File  Size =    1097771 *
*        :          : Tree compression factor =   4.80                       *
******************************************************************************
*Branch  :EventInfoAuxDyn.hardScatterVertexLink                              *
*Entries :       50 : BranchElement (see below)                              *
*............................................................................*
*Br    0 :EventInfoAuxDyn.hardScatterVertexLink : BASE                       *
*Entries :       50 : Total  Size=       1839 bytes  File Size  =        280 *
*Baskets :        1 : Basket Size=       8000 bytes  Compression=   4.36     *
*............................................................................*

Only shows "BASE"

@nikoladze nikoladze added the bug (unverified) The problem described would be a bug, but needs to be triaged label Mar 19, 2024
@jpivarski
Copy link
Member

The data looks not too strange

>>> tree["EventInfoAuxDyn.hardScatterVertexLink/EventInfoAuxDyn.hardScatterVertexLink"].debug(0, skip_bytes=10, dtype=">i4")
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
 55 209  69 151   0   0   0   0
  7 ---   E --- --- --- --- ---
      936461719               0

What if you don't skip_bytes=10? The "10" is motivated by a std::vector's 4-byte number of items + a standard object's 6-byte header (4-byte number of bytes + 2-byte class version). Leave off the skip_bytes and the dtype to see just the raw bytes without an integer interpretation (which doesn't look like the right interpretation here, anyway).

@nikoladze
Copy link
Contributor Author

I just guessed the "10" as the size of ElementLink header, but i don't know why it seems 10 here. In the vector<vector<ElementLink<...>>> branches it seems more like 20

>>> tree["AnalysisElectronsAuxDyn.trackParticleLinks"].debug(0, skip_bytes=10+4+20, dtype=">i4")
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
 46  66 219  11   0   0   0   0
  .   B --- --- --- --- --- ---
      776133387               0

Skipping 10 bytes for the outer vector header, 4 bytes for the inner vector size and presumably 20 bytes for the ElementLink<DataVector<xAOD::TrackParticle_v1>>

@jpivarski
Copy link
Member

If you have a std::vector<std::vector<X>>, you'll see header-like stuff (number of bytes, followed by class version, followed by number of items) for 20 bytes in a row because of the outer std::vector and the first inner std::vector.

For an ElementLink outside of any std::vector, I don't know whether ROOT will give it a header or not, and if it did, it would normally be 6 bytes (number of bytes, followed by class version; the "number of items" is an std::vector thing). As a first step in examining a new class or a class in a new context, we remove all of the "skips" and see what all of the bytes look like.

@jpivarski jpivarski added this to Important in Finalization Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug (unverified) The problem described would be a bug, but needs to be triaged
Projects
Finalization
Deserialization
Development

No branches or pull requests

2 participants