Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

array_agg and undefined/none values #333

Open
carsten-jahn opened this issue Jul 11, 2023 · 3 comments
Open

array_agg and undefined/none values #333

carsten-jahn opened this issue Jul 11, 2023 · 3 comments

Comments

@carsten-jahn
Copy link

Thanks a lot for this great library!

I came across an issue with array_agg, I would like to preserve undefined / null values in my dataset and keep them in the aggregated array. However, arquero is skipping the undefined values and the result of array_agg is a shorter array.

I tried to implement a custom aggregate function, but the result was the same, i.e. these values seem to be filtered out before the aggregation is invoked.

It would be great if there was a way of just aggregating all values and "non-values" in an array.

@dldx
Copy link

dldx commented Dec 1, 2023

I just discovered the same issue. It seems like a bug to me?

@carsten-jahn Did you find a solution for this?

How to reproduce:

aq.table({ v: [1, null, 1, 2, 3, 1] })
  .rollup({ a: op.array_agg('v') }) // [1, 1, 2, 3, 1]

@carsten-jahn
Copy link
Author

carsten-jahn commented Dec 7, 2023

Hi @dldx , I looked into this again. I don't have an elegant solution for this in arquero. All I can do is setting the null values to a specific constant before calling array_agg, and eventually replacing the constant with null before using the array elsewhere.

I had a look into implementing a custom aggregator function as described in https://uwdata.github.io/arquero/api/extensibility#addAggregateFunction and https://observablehq.com/@uwdata/adding-aggregate-functions-to-arquero , however this doesn't help either, as its add function is called for every "valid" element only. The state visible in the aggregator does tell you how many invalid elements there are, but you cannot know the order in which those appear.

@dldx
Copy link

dldx commented Dec 8, 2023

@carsten-jahn Thanks for replying! I will try to investigate further :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants