Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to use a UDF or lambda in groupby agg? #2201

Open
kylegilde opened this issue Sep 27, 2021 · 1 comment
Open

Is there a way to use a UDF or lambda in groupby agg? #2201

kylegilde opened this issue Sep 27, 2021 · 1 comment
Labels
bug Something isn't working

Comments

@kylegilde
Copy link

The following code doesn't work. Thank you!

@pandas_udf('string')
def as_set(x):
    return str(set(x))
spark.udf.register('as_set', as_set)


kdf = ks.DataFrame(
    {'a': [1, 2, 2, 4, 5, 6],
     'b': ["one", "one", "one", "two", "two", "two"]},
    index=[10, 20, 30, 40, 50, 60]
)
kdf.groupby(['b']).agg({'a', as_set})

ValueError: aggs must be a dict mapping from column name to aggregate functions (string or list of strings).
@itholic
Copy link
Contributor

itholic commented Dec 9, 2021

Thanks for the report, @kylegilde !

And currently the Koalas project is only in maintaining mode, so the response could be quite delayed.

The Koalas project is currently being managed more actively in PySpark under the name of "pandas API on Spark" (you can simply re-use the existing Koalas code by importing import pyspark.pandas as ks)

So if you're going to continue using Koalas, I recommend using PySpark! (You can get a quicker response if you report the issue to the Apache Spark JIRA)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants