New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update compute_dict_like to get all columns #58452
base: main
Are you sure you want to change the base?
Update compute_dict_like to get all columns #58452
Conversation
75f593e
to
e8f3172
Compare
b2ee3b4
to
79a8ea6
Compare
pandas/core/apply.py
Outdated
for key, how in func.items(): | ||
for index in range(df.shape[1]): | ||
col = df.iloc[:, index] | ||
if col.name != key: | ||
continue | ||
|
||
series = obj._gotitem(key, ndim=1, subset=col) | ||
result = getattr(series, op_name)(how, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you see if the following works instead:
subset = selected_obj[[key]]
subobj = obj._gotitem(key, ndim=2, subset=subset)
result = getattr(subobj, op_name)(how, **kwargs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using getattr
on DataFrame throws exception in this case
df = pd.DataFrame(
[[1, 2, 3], [1, 3, 4], [2, 4, 5]],
columns=["a", "b", "c"],
)
gb = df.groupby("a")
result = gb.agg({"c": ["sum", "min", "max", "min"]})
print(result)
This is due to reconstruct_func
in DataFrameGroupby
if not relabeling:
if isinstance(func, list) and len(func) > len(set(func)):
# GH 28426 will raise error if duplicated function names are used and
# there is no reassigned name
raise SpecificationError(
"Function names must be unique if there is no new column names "
"assigned"
)
c0ddcdd
to
31b05af
Compare
Thank you for the review, I have added comments as per the review. |
Thanks for the patience here @undermyumbrella1 - I should be able to get to this by Friday. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic in the for loop that I was trying to change looks good. In general it'd be best if we didn't have to split up a DataFrame into Series for performance, but to flexibly handle all the different cases we need to it doesn't seem possible to me.
), | ||
) | ||
gb = df.groupby(level=0) | ||
result = gb.agg({("level1.1", "level2.2"): "min"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add one more like this but with a list instead, e.g. ["min", "max"]
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.