Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categoricals should merge subject to particular rules #2853

Open
agoose77 opened this issue Nov 29, 2023 · 3 comments
Open

Categoricals should merge subject to particular rules #2853

agoose77 opened this issue Nov 29, 2023 · 3 comments
Labels
bug The problem described is something that must be fixed

Comments

@agoose77
Copy link
Collaborator

agoose77 commented Nov 29, 2023

Version of Awkward Array

main

Description and code to reproduce

  • Categoricals should merge for strings, but error for non strings?
  • We can add an extension for merging non-string categoricals.
@agoose77 agoose77 added the bug The problem described is something that must be fixed label Nov 29, 2023
@jpivarski
Copy link
Member

This is going to be hard to understand in 12 months!

Here's what I remember of our conversation:

  • merging X with X results in X (naturally)
  • merging categorical[X] with X results in X (I think this is least surprise, and has the least-bad performance consequences: the second X could be a million distinct strings)
  • merging categorical[X] with categorical[X] results in categorical[X], but the look-up categories have to be combined into a single look-up, without duplicates, which changes the indexes of at least one of the two.

The third case can be handled for the special case in which X = string or X = bytestring, but not in general, not unless the user passes some way of determining whether two instances of X are equal, or a way of ensuring that a set of instances of X has no duplicates (a new ak.behavior?).

For now, the third case should just go to NotImplementedError, so that it's a reminder that something better needs to be done there (and maybe point to this issue). In the planned future, merging-strings should eventually be implemented. The general case is in the unplanned future—we'll only do it if someone encounters this need. (So there may be a NotImplementedError case in the code forever, and that's fine.)

@agoose77
Copy link
Collaborator Author

Thanks @jpivarski — haven't got round to writing up my notes yet! Looks like you've saved me the challenge :)

@agoose77
Copy link
Collaborator Author

agoose77 commented Dec 8, 2023

Just to add some additional context: we discussed making support for merging categoricals be a behavior-overload. It would probably make sense for this to only be defined for RecordArray, meaning that awkward should implement support for merging categoricals of the various primitive types that we support.

@jpivarski jpivarski added this to Unprioritized in Finalization Jan 19, 2024
@jpivarski jpivarski moved this from Unprioritized to P3 in Finalization Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The problem described is something that must be fixed
Projects
Development

No branches or pull requests

2 participants