Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced transformation examples #3433

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Advanced transformation examples #3433

wants to merge 3 commits into from

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Mar 27, 2024

Fixes #3430

@bkamins
Copy link
Member Author

bkamins commented Apr 4, 2024

@nalimilan - what do you think (CI errors are unrelated)?

docs/src/man/working_with_dataframes.md Outdated Show resolved Hide resolved
docs/src/man/working_with_dataframes.md Outdated Show resolved Hide resolved
docs/src/man/working_with_dataframes.md Outdated Show resolved Hide resolved
Comment on lines 876 to 878
The reason why this is needed is that instead `combine` iterates the contents of the value returned
by the operation specification function and tries to expand it, which in our case is a tuple of numbers,
so one gets an error:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The reason why this is needed is that instead `combine` iterates the contents of the value returned
by the operation specification function and tries to expand it, which in our case is a tuple of numbers,
so one gets an error:
Without `Ref`, `combine` iterates the contents of the value returned by the operation specification function, which in our case is a tuple of numbers, and tries to expand it, so one gets an error:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remind me why this isn't allowed? AFAICT it would be possible to destructure the tuple and use its values to fill the columns? BTW, is "expand" the right word here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is related to the issue we recently discussed with the Tables.jl conflict against Julia 1.11. DataFrames.jl does not recognize Tuple as a valid input so it sends it to Tables.jl columntable function. So essentially - the error is because we fall-back to Tables.jl in cases that we do not explicitly handle as special cases (like vectors, matrices, DataFrames etc.).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion is applied in the commit I pushed (I just had to re-word it a bit).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks. But then, when combine iterates, it expects each value to represent a column, not a row, and my suggestion is incorrect, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have written something more precise as your suggestion was not precise enough (I think). What is done is I think best visible here:

julia> df = DataFrame(x=1:2)
2×1 DataFrame
 Row │ x
     │ Int64
─────┼───────
   1 │     1
   2 │     2

julia> combine(df, :x => (x -> ((a=x,), (a=2x,), (a=3x,))) => AsTable)
3×1 DataFrame
 Row │ a
     │ Array…
─────┼────────
   1 │ [1, 2]
   2 │ [2, 4]
   3 │ [3, 6]

The returned tuple ((a=x,), (a=2x,), (a=3x,)) has three elements so it produces three rows. Things work, because each element is a NamedTuple which provides column names that can be used to generate column names with AsTable.

Is it clearer now?

docs/src/man/working_with_dataframes.md Outdated Show resolved Hide resolved
docs/src/man/working_with_dataframes.md Outdated Show resolved Hide resolved
docs/src/man/working_with_dataframes.md Outdated Show resolved Hide resolved
@bkamins
Copy link
Member Author

bkamins commented Apr 13, 2024

@nalimilan - I applied all your suggestions and expanded the examples even more to be more explicit.

docs/src/man/working_with_dataframes.md Outdated Show resolved Hide resolved
docs/src/man/working_with_dataframes.md Outdated Show resolved Hide resolved
docs/src/man/working_with_dataframes.md Outdated Show resolved Hide resolved
docs/src/man/working_with_dataframes.md Outdated Show resolved Hide resolved
Comment on lines 876 to 878
The reason why this is needed is that instead `combine` iterates the contents of the value returned
by the operation specification function and tries to expand it, which in our case is a tuple of numbers,
so one gets an error:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks. But then, when combine iterates, it expects each value to represent a column, not a row, and my suggestion is incorrect, right?

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document custom generation of column names in manual
2 participants