Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nanoevents/methods/physlite needs to distinguish between dak.Array and dak.Array._meta #1075

Open
jpivarski opened this issue Apr 11, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@jpivarski
Copy link
Contributor

This is following up from scikit-hep/awkward#3077.

In

def _get_target_offsets(load_column, event_index):
if isinstance(load_column, dask_awkward.Array) and isinstance(
event_index, dask_awkward.Array
):
# wrap in map_partitions if dask arrays
return dask_awkward.map_partitions(
_get_target_offsets, load_column, event_index, label="get_target_offsets"
)
offsets = load_column.layout.offsets.data
if isinstance(event_index, Number):
return offsets[event_index]
# let the necessary column optimization know that we need to load this
# column to get the offsets
if awkward.backend(load_column) == "typetracer":
awkward.typetracer.touch_data(load_column)
# necessary to stick it into the `NumpyArray` constructor
# if typetracer is passed through
offsets = awkward.typetracer.length_zero_if_typetracer(
load_column.layout.offsets.data
)
def descend(layout, depth, **kwargs):
if layout.purelist_depth == 1:
return awkward.contents.NumpyArray(offsets)[layout]
return awkward.transform(descend, event_index.layout)

load_column can have type dask_awkward.Array past the first if-guard if the event_index is not also dask_awkward.Array. Then some things are done with it that shouldn't be done with a dask_awkward.Array.

The first is ak.backend, a top-level Awkward function that hasn't had a dask-awkward overload (until PR dask-contrib/dask-awkward#498) because it would always return the same thing, "typetracer". But useless functions for the sake of conformity are worthwhile, so I'm in favor of adding dak.backend.

The next is ak.typetracer.touch_data, which is not a high-level function—not the sort of thing that we should be overriding with dask-awkward. It's a public function so that libraries that depend on Awkward (like Coffea) can use them, but it's not intended for data analysts, and that's where dask-awkward's coverage of the Awkward interface should stop. To avoid a confusing downstream error message, I've added PR scikit-hep/awkward#3079 to make Awkward complain about passing a dak.Array into ak.typetracer.touch_data, which means that the issue still has to be fixed.

The Coffea code should detect whether load_column is a dak.Array, and if so, pass load_column._meta into these hidden-but-public Awkward functions. Where, exactly, you want to do that depends on the logic of this function; I'm not going to suggest a change. I noticed that on line 115, you're again assuming that load_column is not a dak.Array (which doesn't have a layout), so someone knowledgeable about it, such as @nikoladze (the original author), should take a look.

@jpivarski jpivarski added the bug Something isn't working label Apr 11, 2024
@jpivarski jpivarski changed the title A bug was found in coffea.XYZ nanoevents/methods/physlite needs to distinguish between dak.Array and dak.Array._meta Apr 11, 2024
@matthewfeickert
Copy link
Contributor

matthewfeickert commented Apr 12, 2024

I noticed that on line 115, you're again assuming that load_column is not a dak.Array (which doesn't have a layout), so someone knowledgeable about it, such as @nikoladze (the original author), should take a look.

Thanks @jpivarski! This is great. I think realistically though that @nikoladze is low on time to contribute patches, so I think this is going to be a joint effort from probably me, with @ekourlit, @alexander-held, and @kyungeonchoi giving advice (maybe @kratsg too). ATLAS still needs to find someone from the ATLAS Analysis Model Group (AMG) to be able to maintain the PHYSLITE code, but for the timelines that this needs to be dealt with, I'm going to take responsibility for this. Though I am not a good candidate to be that maintainer given my other project responsibilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants