Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support dtype kwarg for grouped reductions #121

Open
dcherian opened this issue Sep 28, 2023 · 7 comments
Open

support dtype kwarg for grouped reductions #121

dcherian opened this issue Sep 28, 2023 · 7 comments

Comments

@dcherian
Copy link
Contributor

No description provided.

@max-sixty
Copy link
Collaborator

Is .sum(dtype=foo) equivalent to calling .astype(foo) on the input? On the result?

@dcherian
Copy link
Contributor Author

No, dtype is the dtype of the accumulator, so for example, you might want to use dtype=np.float64 to more accurately reduce np.float32 values.

Here's a PR with useful tests and discussion: xarray-contrib/flox#131

@max-sixty
Copy link
Collaborator

I see. I think this is possible, though not trivial — currently we have a list of candidate signatures which numba will JIT compile for. We could generate new ones at runtime, but I'm not sure how numba handles a many-to-one mapping of inputs-to-outputs; so possibly we'd need to compile a new function...

@dcherian
Copy link
Contributor Author

Ah I think this is lower priority. We can easily raise if dtype is provided and ask them to use another "engine"

@max-sixty
Copy link
Collaborator

We can easily raise if dtype is provided and ask them to use another "engine"

Yes or to convert first (albeit requires a copy)

@max-sixty
Copy link
Collaborator

One question out of #128:

To what extent do we want to convert types? Currently we always have the same input type as output type... ...except for bool which we convert to ints (ref #128 )

But often that's not what we want — e.g. it is useful to have a float mean of ints, rather than an "int mean" (sum and then integer division).

A few choices:

  • Always keep types — current state (except for bools...)
  • Always convert to float64 — pandas' approach IIUC
  • Allow passing a dtype arg — great but requires some work

Unless folks disagree, I think we'll stay at the current state and see if we get traction and folks ask for a change...

@dcherian
Copy link
Contributor Author

dcherian commented Oct 2, 2023

To what extent do we want to convert types?

I think we can just ask the user to do so...

I think we'll stay at the current state and see if we get traction and folks ask for a change

IMO this is OK.

A few choices:
Always keep types — current state (except for bools...)
Always convert to float64 — pandas' approach IIUC
Allow passing a dtype arg — great but requires some work

Perhaps @shoyer has an opinion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants