Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor!: back nanoevents.methods.vector with scikit-hep vector #991

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

lgray
Copy link
Collaborator

@lgray lgray commented Jan 14, 2024

Fixes #992

Right now this is breaking in terms of some low-level definitions of physics variables, in particular:

  • rho is now always transverse, as opposed to it being previously r, this can certainly screw up math if you're not expecting it

Breaking interface differences (as you can see in tests):

  • we're now beholden to vector's much more strict typing of vectors you have to use to_VectorND() to correctly compare vectors.

But that doesn't seem to have too many side effects?

However, all the tests pass and the current implementation is backwards compatible in terms of delta_phi vs deltaphi and delta_r vs. deltaR so we can actually deprecate that in a reasonable way.

Another thing we need to address is where the functions nearest and metric_table should live, which are fairly often used but could be advertised better.

@nsmith- This will require a thorough review as well as contribution from you, and we should discuss how we want to roll this out.

@lgray lgray marked this pull request as draft January 14, 2024 20:12
@lgray lgray force-pushed the use_scikithep_vector branch 3 times, most recently from 2ddea28 to 43e1f19 Compare January 16, 2024 23:13
Copy link
Member

@nsmith- nsmith- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the only painful part is r vs. rho. Otherwise looks like it's drag and drop. @jrueb might want to review this as well.

src/coffea/nanoevents/methods/vector.py Outdated Show resolved Hide resolved
0.6078019961139605,
]

computed_dphi = a.delta_phi(b).to_numpy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the explicit cast required now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the comment on approx.

tests/test_nanoevents_vector.py Show resolved Hide resolved
tests/test_nanoevents_vector.py Outdated Show resolved Hide resolved
src/coffea/nanoevents/methods/vector.py Outdated Show resolved Hide resolved
@lgray
Copy link
Collaborator Author

lgray commented Jan 18, 2024

unfortunately @jrueb has left academia for industry, unless he wants to review out of the kindness of his heart.

@lgray
Copy link
Collaborator Author

lgray commented Jan 18, 2024

I'm putting in a deprecation for ~6 months from now over in #997.

@lgray lgray force-pushed the use_scikithep_vector branch 4 times, most recently from 9ca78b9 to d7ad56c Compare January 20, 2024 03:56
@ikrommyd
Copy link
Contributor

@lgray Is the plan to keep the public delta_phi and delta_r functions as is? I suggest you do because there are some cases where a user defined "eta" instead of the .eta attribute is needed

@lgray
Copy link
Collaborator Author

lgray commented Jan 21, 2024

In the first phase of this, yes we'll keep those functions for backwards compatibility.

In the long run, when we remove nanoevents.methods.vector, vector supplies deltaR and deltaphi and folks should migrate to use what is already supplied. You can see we now just use those functions within their underscore equivalents.

@lgray lgray force-pushed the use_scikithep_vector branch 7 times, most recently from 4a7ef1e to 2374127 Compare January 26, 2024 22:18
@lgray
Copy link
Collaborator Author

lgray commented Jan 26, 2024

@nsmith- tests fail after appeasing linter :-(

@nsmith-
Copy link
Member

nsmith- commented Jan 26, 2024

Yes, I added tests and am now working to improve them

@lgray
Copy link
Collaborator Author

lgray commented Jan 26, 2024

Ah, gotcha - yeah putting this together with scikit-hep/vector#413 I figured you may be messing around.

@lgray lgray force-pushed the use_scikithep_vector branch 3 times, most recently from 8946d59 to bbcf978 Compare January 30, 2024 15:23
@lgray
Copy link
Collaborator Author

lgray commented Jan 30, 2024

I think one place where we could benefit a lot from a thorough look-over by @Saransh-cpp is to make sure we're not unintentionally clobbering any of the vector interface with parts of nanoevents.vector, and likewise @nsmith- if there's anything we can do to make the delta between skh-vector-backed nanoevents.vector closer to the original implementation we should try to smooth all that out.

nsmith- and others added 3 commits March 6, 2024 11:42
@lgray lgray marked this pull request as ready for review March 15, 2024 17:36
@lgray
Copy link
Collaborator Author

lgray commented Mar 15, 2024

@nsmith- @Saransh-cpp from the looks of it we are now close to being able to merge this change. Please provide reasons not to do so yet, or any further tests that must be done before proceeding.

Copy link
Contributor

@Saransh-cpp Saransh-cpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me in this state!

nsmith-
nsmith- previously approved these changes Mar 19, 2024
@lgray
Copy link
Collaborator Author

lgray commented Mar 20, 2024

This fails on a real user analysis with:

File "/Users/lgray/coffea-dev/ewkcoffea/ewkcoffea/modules/selection_wwz.py", line 334, in get_mt2
    rest_WW = w_lep0 + w_lep1 + misspart
...
TypeError: no numpy.add overloads for custom types: Candidate, PtEtaPhiMLorentzVector

Please update and harden tests before we merge. This appears to be on the coffea side of things.

@Saransh-cpp @nsmith-

@lgray lgray dismissed nsmith-’s stale review March 20, 2024 13:55

fails in user analysis for missing overloads to candidate

@nsmith-
Copy link
Member

nsmith- commented Mar 20, 2024

Ah, this is going to be an issue. We have a class hierarchy that in principle would all need dispatch updates, so fixing it for Candidate just means that now it will fail for Electron or Jet. I'm kind of puzzled why awkward cannot do the type resolution automatically.

@nsmith-
Copy link
Member

nsmith- commented Mar 20, 2024

So #1063 is just pushing the issue one level down.

@Saransh-cpp
Copy link
Contributor

Oh yes, I missed that, thanks! I don't think there is a better way to do this? Should I add dispatch updates for Electron and Jet in a similar way?

@Saransh-cpp
Copy link
Contributor

More tests are failing, and I think there are more classes for which we'll have to do the same - Particle, Muon, MasslessParticle, Photon, Track, Tower, Vertex, MissingET - everything inheriting the coffee vector classes.

@nsmith-
Copy link
Member

nsmith- commented Mar 21, 2024

@jpivarski is there a way to have awkward dispatch using the method resolution order of the mixin types, rather than having to register every subtype explicitly?

@jpivarski
Copy link
Contributor

The ak.behavior → Python class object mapping (that is, through the __record__ parameter name) has to map to the concrete classes, but those concrete classes can share superclasses. If we're talking about getting methods and properties on class instances, those methods and properties can be inherited onto the concrete class, but the concrete class must be the one that ak.behavior points to.

For ufunc and binary operator overloading, it's still the concrete classes that have to be named, but those a free functions, rather than methods and properties, so there isn't an easy way for inheritance to work. If you want to make np.add call function f for all subclasses of T, you need to put signatures with each multiple dispatch combination into the ak.behavior. That's why Vector has for loops like this:

https://github.com/scikit-hep/vector/blob/06a733784d8df77d2fe4ad0f87051f725fd6a831/src/vector/backends/awkward.py#L1519-L1546

@agoose77 and I brought this up several times—how to describe inheritance in ak.behavior so that a single line

ak.behavior[np.add, "Vector", "Vector"] = lambda v1, v2: v1.add(v2)

would effectively do

for left in [names of all subclasses of Vector]:
    for right in [names of all subclasses of Vector]:
        ak.behavior[np.add, left, right] = lambda v1, v2: v1.add(v2)

The problem is that the interface of ak.behavior gets more complicated: we'd have to tell it what the names of all subclasses of some common class is independently of Python's own class inheritance mechanism—having two class inheritance mechanisms, Python's and Awkward Array's, which can get out of sync, could lead to an amazingly messy situation. These names, "Vector", "Vector2D", etc., are distinct from Python class objects because parameters have to be JSON to be serialized in Forms, passed from C++, to and from Julia, etc.

Having to build these things with for loops is cumbersome, but I'm afraid that having two parallel inheritance mechanisms would be worse in practice.

@Saransh-cpp
Copy link
Contributor

Thank you for the explanation! I will add the loops for every vector class in coffee. Additionally, should coffea also have distinct and not the class names in the behavior dict?

@jpivarski
Copy link
Contributor

Additionally, should coffea also have distinct and not the class names in the behavior dict?

I'm not sure what you mean. Is it the distinction between putting behaviors in a global ak.behaviors versus putting them in each ak.Array.behaviors?

The fact that Coffea used ak.Array.behaviors was great for debugging that feature, and ak.Array should have that feature. However, when I present Vector, I always do vector.register_awkward() once globally after import because then I don't have to show special constructors—Awkward Arrays with Vectors are just ordinary Awkward Arrays. Coffea hasn't had this problem because the data come from files—there has to be a special constructor anyway.

But maybe some of Coffea's metadata handling costs come from passing behaviors from one ak.Array to another. (The same costs would have been present in Coffea 0.7, but maybe they're exacerbated by Dask.) If the behaviors are being passed around as an unexamined dict, then there's no appreciable cost, but if the dicts need to be copied or manipulated, it might add up.

Maybe I'm misunderstanding @Saransh-cpp's question, but either way, that might be something to consider: switching to coffea.register_awkward() (if not doing it automatically with import coffea).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use scikit-hep vector to as the backend for vector math in nanoevents
5 participants