Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss: transformation vs. aggregation in agg vs. transform #27389

Closed
ghost opened this issue Jul 14, 2019 · 6 comments
Closed

Discuss: transformation vs. aggregation in agg vs. transform #27389

ghost opened this issue Jul 14, 2019 · 6 comments
Labels
Apply Apply, Aggregate, Transform Needs Discussion Requires discussion from core team before further action

Comments

@ghost
Copy link

ghost commented Jul 14, 2019

There have been multiple issues regarding

I'd like to prepare a PR to fix this, but I need to know what the consensus is first.

1. should Groupby/DataFrame/Series.agg disallow transformations?

#14741 (comment) about Groupby.agg

you need to use .transform. .agg is by definition a reducer.

agg currently accepts transformations as well.

DataFrame.agg was merged in #27389 despite repeated objections to mixing transformations and aggregations, #14668 (comment) and #14668 (comment).

2. What should transform('rank') return?

updated

g.min               # one value per group. ok.
g.transform('min')  # one value per group, broadcasted. ok.
g.rank()            # like-index result. Ok
g.transform('rank') # Currently, a nonsense result.

On the one hand, users have been told that transformations don't belong in transform (!): #22509 (comment), #14274 (comment). This makes sense if you think of the transform('name') form as solely for broadcasting aggregations.

On the other hand, the documentation for transform, as well as Wes Mckinney's excellent pandas book portray transform as the dedicated tool for shape-preserving operations, so excluding them from the transform('name') case would be a little surprising.

Personally, I'm +0 for deprecating transform('rank') and with a warning to use g.rank(), as well as for the other transformation ops.

@ghost ghost changed the title DataFrame.agg should refuse non-aggregation operations Discuss: transformation vs. aggregation in agg vs. transform Jul 14, 2019
@WillAyd
Copy link
Member

WillAyd commented Jul 15, 2019

I would be fine with the first point to ensure agg reduces. I'm not clear on what issue points 2 and 3 are trying to solve

@TomAugspurger
Copy link
Contributor

I would be fine with the first point to ensure agg reduces

Agreed. We should deprecate the current behavior with a warning.

I also don't understand point 2.

  1. Is broadcasting a transformation result useful in some other cases?

Yes, I find this useful.

@ghost
Copy link
Author

ghost commented Jul 16, 2019

I'm not clear on what issue points 2 and 3 are trying to solve

I also don't understand point 2.

Edited OP for clarity. What should g.transform('rank') return? should it be the same as g.rank() or be deprecated with a warning? currently it returns nonsense.

@ghost ghost mentioned this issue Jul 19, 2019
1 task
@ghost
Copy link
Author

ghost commented Jul 19, 2019

Agreed. We should deprecate the current behavior with a warning.

Good. But how does that sit with 25.0 being released and the no-deprecation policy during this release cycle coming up on 1.0? Can we do it?

@TomAugspurger
Copy link
Contributor

Discussion on the mailing list: https://mail.python.org/pipermail/pandas-dev/2019-July/001030.html

I think the plan was to be light on new deprecations for 1.0. But I don't know how wedded we are to that idea.

@jbrockmendel jbrockmendel added Apply Apply, Aggregate, Transform Needs Discussion Requires discussion from core team before further action labels Oct 16, 2019
@mroeschke
Copy link
Member

I think the more recent discussion of this has moved to #35725, so closing in favor of that issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

4 participants