Skip to content
This repository has been archived by the owner on Jun 21, 2022. It is now read-only.

Segmentation Fault reading uproot written tree #538

Open
NiclasEich opened this issue Jun 18, 2021 · 8 comments
Open

Segmentation Fault reading uproot written tree #538

NiclasEich opened this issue Jun 18, 2021 · 8 comments

Comments

@NiclasEich
Copy link
Contributor

NiclasEich commented Jun 18, 2021

Hello,

I am trying to write and then read a tree with uproot. Writing the tree and reading it again with uproot is no problem but I encounter a segmentation fault, when trying to open Jagged-Array branches of the tree with other root-tools like a TBrowser or the python root_numpy module. I used the awkward-array documentation on how to write the tree.

A minimal Example of how I write and how it fails would be this:

import numpy as np
import awkward as ak
import uproot3 as ur
import root_numpy as rn

print("Version:\nuproot: {}\nawkward: {}\nroot_numpy: {}".format(ur.__version__, ak.__version__, rn.__version__))

f_path = "/tmp/test_uproot_writing.root"
keys = ["key_a", "key_b", "key_c"]

branch_dict = {}
branch_reg = {}

print("Creating New root-file with uproot")

with ur.recreate( f_path ) as out_file:
    for key in keys:

        arr = ak.to_awkward0( ak.Array( [np.zeros( (20))[:np.random.randint(4, 20)] for i in range(100)] ) )

        dtype = ur.newbranch( np.dtype("f8"), size="{}_counts".format(key) )
        branch_reg[key] = dtype
        branch_dict[key] = arr

        counts = arr.counts
        branch_dict["{}_counts".format(key)] = counts

    out_file["ttree"] = ur.newtree(branch_reg)
    out_file["ttree"].extend(branch_dict)

print("Trying to load the file with root_numpy")

nparr = rn.root2array( [f_path], treename="ttree", stop=None, branches=keys)

With the stack-trace:

Singularity> python3 scripts/debug_uproot_tree_writing.py
/bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
Version:
uproot: 3.14.1
awkward: 1.0.2
root_numpy: 4.8.0
Creating New root-file with uproot
Trying to load the file with root_numpy

 *** Break *** segmentation violation
 Generating stack trace...
 0x00007f3399a82b7a in TreeChain::GetEntry(long long) at /tmp/pip-install-zh223qts/root-numpy_eee9f47128724527985f005dc6ccceda/root_numpy/src/TreeChain.h:240 from /usr/local/lib/python3.6/dist-packages/root_numpy/_librootnumpy.cpython-36m-x86_64-linux-gnu.so
 0x00007f3399a71191 in <unknown> from /usr/local/lib/python3.6/dist-packages/root_numpy/_librootnumpy.cpython-36m-x86_64-linux-gnu.so
 0x00007f3399a76f98 in <unknown> from /usr/local/lib/python3.6/dist-packages/root_numpy/_librootnumpy.cpython-36m-x86_64-linux-gnu.so
 0x000000000050a4a5 in <unknown> from python3
 0x000000000050beb4 in _PyEval_EvalFrameDefault + 0x444 from python3
 0x0000000000507be4 in <unknown> from python3
 0x0000000000509900 in <unknown> from python3
 0x000000000050a2fd in <unknown> from python3
 0x000000000050cc96 in _PyEval_EvalFrameDefault + 0x1226 from python3
 0x0000000000507be4 in <unknown> from python3
 0x000000000050ad03 in PyEval_EvalCode + 0x23 from python3
 0x0000000000634e72 in <unknown> from python3
 0x0000000000634f27 in PyRun_FileExFlags + 0x97 from python3
 0x00000000006386df in PyRun_SimpleFileExFlags + 0x17f from python3
 0x0000000000639281 in Py_Main + 0x591 from python3
 0x00000000004b0dc0 in main + 0xe0 from python3
 0x00007f33c47e2b97 in __libc_start_main + 0xe7 from /lib/x86_64-linux-gnu/libc.so.6
 0x00000000005b259a in _start + 0x2a from python3

Further:

using root-version 6.18/04

@NiclasEich
Copy link
Contributor Author

added the root-version used

@NiclasEich
Copy link
Contributor Author

NiclasEich commented Jun 20, 2021

I encountered some interesting/weird behaviour while trying to even reduce the example:

The following script only SOMETIMES failes with a segmentation-fault. Thus far this looks like a stochastic behaviour to me, with the script failing in 50% of the cases and the rest running fine.

import numpy as np
import awkward as ak
import uproot3 as ur
import root_numpy as rn

f_path = "/tmp/test_uproot_writing.root"
keys = ["key-a"]

print("Creating New root-file with uproot")
with ur.recreate( f_path ) as out_file:

    arr = ak.to_awkward0( ak.Array( [np.zeros( (20))[:np.random.randint(4, 20)] for i in range(100)] ) )
    out_file["ttree"] = ur.newtree( {"key-a": ur.newbranch(np.dtype("f8"), size="n")})
    out_file["ttree"].extend({"key-a": arr, "n": arr.counts})


print("Trying to load the file with root_numpy")

nparr = rn.root2array( [f_path], treename="ttree", stop=None, branches=keys)
print(nparr)

Further, when substituting
arr = ak.to_awkward0( ak.Array( [np.zeros( (20))[:np.random.randint(4, 20)] for i in range(100)] ) ) with arr = ak.to_awkward0( ak.Array( [np.zeros( (20)) for i in range(100)] ) ), the latter array ALWAYS fails, so it might have to do something with how the awkward-Array is handled...

@NiclasEich
Copy link
Contributor Author

Letting both script-versions run 50 times:

The

with ur.recreate( f_path ) as out_file:
    arr = ak.to_awkward0( ak.Array( [np.zeros( (20))[:np.random.randint(4, 20)] for i in range(100)] ) )
    out_file["ttree"] = ur.newtree( {"key-a": ur.newbranch(np.dtype("f8"), size="n")})
    out_file["ttree"].extend({"key-a": arr, "n": arr.counts})

Version failed 26/50 times

and the

with ur.recreate( f_path ) as out_file:
   arr = ak.to_awkward0( ak.Array( [np.zeros( (20)) for i in range(100)] ) )
   out_file["ttree"] = ur.newtree( {"key-a": ur.newbranch(np.dtype("f8"), size="n")})
   out_file["ttree"].extend({"key-a": arr, "n": arr.counts})

failed 50/50 times

@NiclasEich
Copy link
Contributor Author

Same problem with the newest verrsion:

Version:
uproot: 3.14.4
awkward: 1.3.0
root_numpy: 4.8.0

@NiclasEich
Copy link
Contributor Author

When the script is succesful in loading the uproot-written tree with root, I encounter vastly random values in all ranges of magnitude, so I suspect that succes is still corrupted somehow internally.

@jpivarski
Copy link
Member

I'm following this; thanks for submitting it! (Sorry for my silence so far.)

My action is going to be to ensure that this case is not broken in Uproot 4 (follow it on scikit-hep/uproot5#321), which is my highest programming priority right now, though non-programming priorities (summer conferences) have prevented me from updating it for the last few weeks. It will get done this summer, though.

What you're seeing with non-deterministic segfault vs wrong values is typical of ROOT file-writing: if one byte is out of place, reading it back might trip up in different ways. (It might depend on uninitialized values in some array somewhere.)

Your example is an indication that the file is not being written correctly, though I don't think it has ever been tested with root-numpy before. (I think the unit tests use PyROOT to test read-back, other than Uproot itself.) It could be that root-numpy is revealing an error that hadn't been revealed before by reading it in a different way. All of the information needed to read a jagged branch is contained in that branch—although it is usually linked to a "count leaf" in another branch (size in the Uproot 3 API), it is not necessary to read the counts to get the jagged array. It could be that the count leaf is incorrectly written, but only root-numpy attempts to read the count leaf. (Uproot definitely doesn't; I don't know about PyROOT.)

@NiclasEich
Copy link
Contributor Author

Thanks for the answer!
Is there already a version/branch of uproot4 that would allow writing trees?

As mentioned earlier, opening the broken branches in a TBrowser also results in a seg-fault

@jpivarski
Copy link
Member

See scikit-hep/uproot5#321 — the Uproot 4 feature is in progress.

By testing the Uproot 3 feature in PyROOT, we also didn't see what it would do in the TBrowser, which might try to access more, like root-numpy.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants