Update compute_dict_like to get all columns #58452

undermyumbrella1 · 2024-04-27T15:29:43Z

closes BUG: groupby.agg fails when input has duplicate columns and dict input #55041 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

pandas/core/apply.py

rhshadrach · 2024-04-29T21:15:43Z

pandas/core/apply.py

+            for key, how in func.items():
+                for index in range(df.shape[1]):
+                    col = df.iloc[:, index]
+                    if col.name != key:
+                        continue
+
+                    series = obj._gotitem(key, ndim=1, subset=col)
+                    result = getattr(series, op_name)(how, **kwargs)


Can you see if the following works instead:

subset = selected_obj[[key]] subobj = obj._gotitem(key, ndim=2, subset=subset) result = getattr(subobj, op_name)(how, **kwargs)

Using getattr on DataFrame throws exception in this case

df = pd.DataFrame( [[1, 2, 3], [1, 3, 4], [2, 4, 5]], columns=["a", "b", "c"], ) gb = df.groupby("a") result = gb.agg({"c": ["sum", "min", "max", "min"]}) print(result)

This is due to reconstruct_func in DataFrameGroupby

if not relabeling: if isinstance(func, list) and len(func) > len(set(func)): # GH 28426 will raise error if duplicated function names are used and # there is no reassigned name raise SpecificationError( "Function names must be unique if there is no new column names " "assigned" )

pandas/core/apply.py

undermyumbrella1 · 2024-05-07T03:26:15Z

Thank you for the review, I have added comments as per the review.

rhshadrach · 2024-05-08T22:35:39Z

Thanks for the patience here @undermyumbrella1 - I should be able to get to this by Friday.

rhshadrach

The logic in the for loop that I was trying to change looks good. In general it'd be best if we didn't have to split up a DataFrame into Series for performance, but to flexibly handle all the different cases we need to it doesn't seem possible to me.

rhshadrach · 2024-05-10T23:57:18Z

pandas/tests/groupby/aggregate/test_aggregate.py

+        ),
+    )
+    gb = df.groupby(level=0)
+    result = gb.agg({("level1.1", "level2.2"): "min"})


Can you add one more like this but with a list instead, e.g. ["min", "max"]

Update compute_dict_like to get all columns

f6c6408

undermyumbrella1 marked this pull request as draft April 27, 2024 15:29

Kei added 2 commits April 28, 2024 22:31

Add tests

84ef941

Update rst

e8f3172

undermyumbrella1 force-pushed the fix/grouby_agg_dict_input_dup_columns branch from 75f593e to e8f3172 Compare April 28, 2024 14:36

Remove newline from rst

79a8ea6

undermyumbrella1 force-pushed the fix/grouby_agg_dict_input_dup_columns branch from b2ee3b4 to 79a8ea6 Compare April 28, 2024 15:36

undermyumbrella1 marked this pull request as ready for review April 28, 2024 16:26

mroeschke requested a review from rhshadrach April 29, 2024 17:31

mroeschke added the Groupby label Apr 29, 2024

rhshadrach requested changes Apr 29, 2024

View reviewed changes

Kei added 2 commits May 2, 2024 17:15

Project the columns before converting to series group by

c8ca7f7

Merge branch 'main' into fix/grouby_agg_dict_input_dup_columns

31b05af

undermyumbrella1 force-pushed the fix/grouby_agg_dict_input_dup_columns branch from c0ddcdd to 31b05af Compare May 2, 2024 09:15

Kei added 4 commits May 2, 2024 17:24

retrigger doc build

5dd4fbc

Account for 1d/series projection result

a3eac47

Declare var before assignment

7f565ca

Remove if condition

de5948d

rhshadrach requested changes May 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update compute_dict_like to get all columns #58452

Update compute_dict_like to get all columns #58452

undermyumbrella1 commented Apr 27, 2024 •

edited

rhshadrach Apr 29, 2024

undermyumbrella1 May 2, 2024

undermyumbrella1 commented May 7, 2024

rhshadrach commented May 8, 2024

rhshadrach left a comment

rhshadrach May 10, 2024

Update compute_dict_like to get all columns #58452

Are you sure you want to change the base?

Update compute_dict_like to get all columns #58452

Conversation

undermyumbrella1 commented Apr 27, 2024 • edited

rhshadrach Apr 29, 2024

Choose a reason for hiding this comment

undermyumbrella1 May 2, 2024

Choose a reason for hiding this comment

undermyumbrella1 commented May 7, 2024

rhshadrach commented May 8, 2024

rhshadrach left a comment

Choose a reason for hiding this comment

rhshadrach May 10, 2024

Choose a reason for hiding this comment

undermyumbrella1 commented Apr 27, 2024 •

edited