Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Applying a Cut Using ak.mask() Causes Issue with FastJet ClusterSequence #998

Open
cmoore24-24 opened this issue Jan 19, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@cmoore24-24
Copy link

cmoore24-24 commented Jan 19, 2024

Hello @lgray and maybe @chrispap95,

This is a bug that seems to be stuck between Coffea, Awkward, and FastJet. As in the title, using ak.mask to apply a cut, a la
smaller = ak.mask(bigger, cut) rather than something like smaller = bigger[cut] prevents FastJet from being able to build a cluster sequence. For reference, here are the relevant package versions:

coffea is 2024.1.1
awkward is 2.5.2
dask_awkward is 2024.1.1
fastjet is 3.4.1.3

To Reproduce
Reproduction can be done with any PFNano file opened with NanoEventsFactory and schemaclass=PFNanoAODSchema.

events = NanoEventsFactory.from_root(...)
fatjet = events.FatJet
cut = (fatjet.pt > 50)
slimmed = ak.mask(fatjet, cut)
pf = ak.flatten(slimmed.constituents.pf, axis=1)
jetdef = fastjet.JetDefinition(fastjet.cambridge_algorithm, 0.2)
cluster = fastjet.ClusterSequence(pf, jetdef)

The cluster sequence should fail, with the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[46], line 1
----> 1 cluster = fastjet.ClusterSequence(pf, jetdef)

File ~/miniconda3/envs/coffea2024/lib/python3.9/site-packages/fastjet/__init__.py:263, in ClusterSequence.__init__(self, data, jetdef)
    261 if dak is not None and isinstance(data, dak.Array):
    262     self.__class__ = fastjet._pyjet.DaskAwkwardClusterSequence
--> 263     fastjet._pyjet.DaskAwkwardClusterSequence.__init__(
    264         self, data=data, jetdef=jetdef
    265     )
    266 else:
    267     raise TypeError(
    268         f"{data} must be an awkward.Array, dask_awkward.Array, or list!"
    269     )

File ~/miniconda3/envs/coffea2024/lib/python3.9/site-packages/fastjet/_pyjet.py:352, in DaskAwkwardClusterSequence.__init__(self, data, jetdef)
    348     self._internalrep = fastjet._singleevent._classsingleevent(
    349         length_zero_data, self._jetdef
    350     )
    351 elif self._jagedness >= 3 or self._check_general(data):
--> 352     self._internalrep = fastjet._generalevent._classgeneralevent(
    353         length_zero_data, jetdef
    354     )

File ~/miniconda3/envs/coffea2024/lib/python3.9/site-packages/fastjet/_generalevent.py:24, in _classgeneralevent.__init__(self, data, jetdef)
     20 for i in range(len(self._clusterable_level)):
     21     self._clusterable_level[i] = ak.Array(
     22         self._clusterable_level[i].layout.to_ListOffsetArray64(True)
     23     )
---> 24     px, py, pz, E, offsets = self.extract_cons(self._clusterable_level[i])
     25     px = self.correct_byteorder(px)
     26     py = self.correct_byteorder(py)

File ~/miniconda3/envs/coffea2024/lib/python3.9/site-packages/fastjet/_generalevent.py:214, in _classgeneralevent.extract_cons(self, array)
    213 def extract_cons(self, array):
--> 214     px = np.asarray(ak.Array(array.layout.content, behavior=array.behavior).px)
    215     py = np.asarray(ak.Array(array.layout.content, behavior=array.behavior).py)
    216     pz = np.asarray(ak.Array(array.layout.content, behavior=array.behavior).pz)

File ~/miniconda3/envs/coffea2024/lib/python3.9/site-packages/awkward/highlevel.py:1236, in Array.__getattr__(self, where)
   1231         raise AttributeError(
   1232             f"while trying to get field {where!r}, an exception "
   1233             f"occurred:\n{type(err)}: {err!s}"
   1234         ) from err
   1235 else:
-> 1236     raise AttributeError(f"no field named {where!r}")

AttributeError: no field named 'px'

Expected behavior
When the cut is done with smaller = bigger[cut], the ClusterSequence is fine and produces
<fastjet._pyjet.DaskAwkwardClusterSequence at 0x7f88d4091670>

@cmoore24-24 cmoore24-24 added the bug Something isn't working label Jan 19, 2024
@chrispap95
Copy link
Contributor

Hey @cmoore24-24 , there is a known issue (whose fix I have been postponing each week since July... 🥲) with the way the offsets are handled in fastjet and I have seen it causing issues with masking. See: scikit-hep/fastjet#238 (comment)
Tbh though, your error message looks a bit different and more like unregistered behavior. But I would have to reproduce and investigate to be able to say more. I will try later today.

@chrispap95
Copy link
Contributor

Also, I suppose you are including the line: vector.register_awkward() right?

@cmoore24-24
Copy link
Author

Hi @chrispap95, apologies for the delay. I haven't come across vector.register_awkward() before, but when I've tried to implement it just now, I see

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[8], line 1
----> 1 vector.register_awkward()

AttributeError: module 'coffea.nanoevents.methods.vector' has no attribute 'register_awkward'

Is there another vector attribute I should be using?

@lgray
Copy link
Collaborator

lgray commented Jan 23, 2024

@cmoore24-24 you need to do that for scikit-hep vector!!!

import vector
vector.register_awkward()

Nanoevents vector embeds the behaviors in the objects themselves, somewhat different semantics.

@cmoore24-24
Copy link
Author

Ah, okay, thanks! Haven't seen that before but I've included it now, but no change in the behavior of ak.mask() unfortunately. The same error as above still appears.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants