Skip to content
This repository has been archived by the owner on Jun 21, 2022. It is now read-only.

Type error when slicing vector<vector<float>> ObjectArray branch #519

Open
asopio opened this issue Sep 3, 2020 · 1 comment
Open

Type error when slicing vector<vector<float>> ObjectArray branch #519

asopio opened this issue Sep 3, 2020 · 1 comment

Comments

@asopio
Copy link

asopio commented Sep 3, 2020

Hi,
I encountered some unexpected behaviour when trying to slice a branch of type vector<vector<float>>, containing vectors of variable length. Getting the array returns the correct result:



t_test = uproot.open("testfile.root")["testtree"]
tnew['testbranch'].array()[:,0]

<ObjectArray [[[0.12897478, 0.33032617, 0.0053174845, 0.0062657734, 0.16532688, 0.07413943, 0.25900677, 0.084950365], [0.35509595]] [[0.15334472, 0.10603446, 0.004459681, 0.0222604, 0.38413692, 0.39519757], [0.006101171, 0.014523825, 0.036121503, 0.039234158, 0.22525033, 0.34617442]] [[0.02478936, 0.06844998, 0.027409986, 0.0065677217, 0.18938226, 0.0432718, 0.010767343, 0.18636794, 0.4008715, 0.41020113], [0.37996235, 0.34230024]] ... [[0.2614188, 0.20125817, 0.046599977, 0.05807799, 0.010248912, 0.020693984, 0.05169987, 0.10514213, 0.43268573], [0.29535455]] [[0.23408501, 0.14874455, 0.016859913, 0.008109958, 0.010894945, 0.020351907, 0.38372123, 0.09444051]] [[0.070886604, 0.4732762, 0.024249015, 0.04118293, 0.49230808, 0.14104557, 0.17984901, 0.40091348], [0.48583144, 0.030749094, 0.015568347, 0.18870124, 0.12960392, 0.16022786, 0.076917425, 0.2069286, 0.052053016, 0.33399573]]] at 0x7f4e0ab48ad0>

... but slicing seems to convert the entries to uints:

tnew['testbranch'].array()[:,0]

[array([  0,   0,   0,   2,   0,   0,   0,   8,  62,   4,  17, 247,  62,
        169,  32, 131,  59, 174,  62,  75,  59, 205,  81,  30,  62,  41,
         75, 115,  61, 151, 214, 106,  62, 132, 156, 137,  61, 173, 250,
        117,   0,   0,   0,   1,  62, 181, 207,  35], dtype=uint8),
 array([  0,   0,   0,   2,   0,   0,   0,   6,  62,  29,   6, 102,  61,
        217,  40, 152,  59, 146,  34, 132,  60, 182,  91, 113,  62, 196,
        173, 152,  62, 202,  87,  86,   0,   0,   0,   6,  59, 199, 236,
         85,  60, 109, 245,  86,  61,  19, 244,  36,  61,  32, 179, 255,
         62, 102, 168,   6,  62, 177,  61, 198], dtype=uint8),
 array([  0,   0,   0,   2,   0,   0,   0,  10,  60, 203,  19,  14,  61,
        140,  47, 129,  60, 224, 138, 232,  59, 215,  54,  11,  62,  65,
        237, 108,  61,  49,  61, 197,  60,  48, 105, 131,  62,  62, 215,
         61,  62, 205,  63,   7,  62, 210,   5, 226,   0,   0,   0,   2,
         62, 194, 138, 109,  62, 175,  65, 250], dtype=uint8),
 ...]

Surely, this is not intentional.

cheers,
Alex

arraybug.zip

@jpivarski
Copy link
Member

This is a fundamental issue with Awkward 0's ObjectArray: the elements are Python objects. You're right that the above behavior is not intended, but no behavior is intended for tnew['testbranch'].array()[:, 0] where the first axis is the ObjectArray. ("The behavior is undefined.") Multidimensional slices can't cross the boundary from outside an ObjectArray into an ObjectArray—the fact that anything happened at all is an accident.

In most cases like this, you encounter some error. Somehow, this example happened to slip through code it wasn't intended for and yield a result—a bizarre one—through sheer happenstance. Perhaps the right thing to do would have been to detect these multidimensional slices in ObjectArray and refuse it upfront: define the behavior to be an error. Awkward 1 is more aggressive about identifying and refusing undefined behavior (because half of it is written in C++, where undefined behavior would be more dangerous).

This is the reason that Awkward 1 has no ObjectArray and Uproot 4 is more of a rewrite than an upgrade to take advantage of it. In Uproot 4, std::vector<float> → ListArray(NumpyArray) and std::vector<std::vector<float>> → ListArray(ListArray(NumpyArray)), etc. to any depth, despite the fact that the ROOT serialization for these cases are very different. (The motivation for Uproot 3 creating ObjectArrays was to mirror the way it's serialized in ROOT, but we now know that it's better to hide that difference.)

Your example works fine in Awkward 1/Uproot 4:

>>> import uproot4
>>> t = uproot4.open("issue519.root")["testtree"]
>>> a = t["testbranch"].array()

>>> a
<Array [[[0.129, 0.33, ... 0.0869, 0.197]]] type='10 * var * var * float64'>
>>> a[:, 0]
<Array [[0.129, 0.33, ... 0.0869, 0.197]] type='10 * var * float64'>

>>> import awkward1 as ak
>>> ak.num(a)     # most of these sublists have only one element
<Array [2, 2, 2, 2, 1, 1, 1, 1, 1, 1] type='10 * int64'>

>>> a[:, -1]      # so hey, let's look at the last in each
<Array [[0.355], ... 0.458, 0.0869, 0.197]] type='10 * var * float64'>

It's already the case that Awkward 1 is more mature than Awkward 0, and it's better documented. I've been recommending Awkward 1 since March with ak.from_awkward0/ak.to_awkward0 for brief forays into the new library/new interface, so that it can be minimally adopted without breaking existing scripts.

Uproot 4 is lacking documentation and the file-writing feature, so I've been cautiously recommending it on a trial basis. I'm working on the documentation now, and when that's done, I'll be recommending Uproot 4 more forcefully. When the file-writing feature is also done, Uproot 4 will have feature parity with Uproot 3 and I'll actually move the GitHub and PyPI names:

  • awkward1awkward, awkwardawkward0
  • uproot4uproot, uprootuproot3

so that pip install awkward uproot installs the new ones and users will either have to change their scripts or import uproot3 as uproot. That will happen before the end of this year.

On the point of ROOT deserialization, Uproot 4 is already more capable than Uproot 3, so unless you're interested in file-writing or don't want to change interface yet, you'll probably prefer Uproot 4.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants