column names of aggregated DataFrame with anonymous functions #1276

pfarndt · 2017-11-14T15:44:20Z

When using an anonymous functions to generate an aggregated DataFrame the column names are not reproducible and inconvenient for further usage:

using DataFrames

d = DataFrame(g = [1,1,2,2], v=1:4)

println(names(aggregate(d, :g, x->sum(x))))
println(names(aggregate(d, :g, x->sum(x))))

This code produces:

Symbol[:g, Symbol("v_#1")]
Symbol[:g, Symbol("v_#3")]

although the last two commands are identical.

The documentation is promising names like v_\lambda1 at this point. I am using Julia v0.6.0.

IMO the function _fnames in https://github.com/JuliaData/DataFrames.jl/blob/master/src/other/utils.jl should be adjusted for new versions of julia.

The text was updated successfully, but these errors were encountered:

cjprybol · 2017-11-14T18:31:35Z

not reproducible

I was able to reproduce this

julia> using DataFrames

julia> d = DataFrame(g = [1,1,2,2], v=1:4)
4×2 DataFrames.DataFrame
│ Row │ g │ v │
├─────┼───┼───┤
│ 1   │ 1 │ 1 │
│ 2   │ 1 │ 2 │
│ 3   │ 2 │ 3 │
│ 4   │ 2 │ 4 │

julia> println(names(aggregate(d, :g, x->sum(x))))
Symbol[:g, Symbol("v_#1")]

julia> println(names(aggregate(d, :g, x->sum(x))))
Symbol[:g, Symbol("v_#3")]

The names of the columns are the function identifiers. Here's another fresh session to show the identifiers of the anonymous functions, which you'll see match the column names, and are again reproducible.

julia> x->sum(x)
(::#1) (generic function with 1 method)

julia> x->sum(x)
(::#3) (generic function with 1 method)

inconvenient for further usage

The previously used lambda syntax has no relation (aside from order) to the actual functions that were used to create the data and hence was removed. Unfortunately, it is not yet possible to extract the original code of an anonymous function by its identifier (see JuliaLang/julia#2625 (comment)), although in principle if that is added as a language feature then using the identifiers of the anonymous functions will provide both a stable identifier as well as a way to recover the function associated with that identifier. Currently, only the "stable identifier" part is supported while the lambda syntax cannot support either.

If you would like the columns to have specific names, you simply need to use named functions. For example, in the case that you provided using the anonymous function isn't recommended and you can simply use the sum function instead of wrapping it into an anonymous function with x -> sum(x).

julia> names(aggregate(d, :g, sum))
2-element Array{Symbol,1}:
 :g
 :v_sum

alternatively, you can retain the lambda naming by giving your anonymous functions that name

julia> λ1(x) = sum(x)
λ1 (generic function with 1 method)

julia> names(aggregate(d, :g, λ1))
2-element Array{Symbol,1}:
 :g
 :v_λ1

You are correct that the documentation for that section is out of date, it will be corrected after #1252 is merged. The _ fnames should have already been removed. If you'd like to contribute a PR to delete it, that would be great!

nalimilan · 2017-11-14T18:55:07Z

The presence of a # is really annoying since it makes the symbol non-standard. Maybe we should replace it with another symbol, probably an ASCII one so that it's easy to type (f?).

pfarndt · 2017-11-14T19:30:19Z

You got the point - the # is annoying.

To change it to something else (e.g. f) only solves half of the problem, since I might call such an aggregate statement (with more than one and more complicated anonymous function than just a sum) several times and use the resulting column further on. Right now (since the numerical identifier is changing from one call to another) I would have to inquire its name by calling names and calculating (since I might want to apply the anonymous function to several columns) its position. Therefore the "previously used lambda syntax" that "has no relation (aside from order) to the actual functions", would be very helpful, because it has all the information one needs, i.e. column name and which of my anonymous functions went over it.

nalimilan · 2017-11-14T21:02:33Z

So you mean, just use f1, f2...? Why not.

bkamins · 2018-12-14T09:53:35Z

Closing as #1576 fixed this

nalimilan added the intro issue label Sep 20, 2018

nalimilan added the Hacktoberfest label Oct 2, 2018

bkamins mentioned this issue Oct 27, 2018

improve naming of anonymous functions in aggregate #1576

Merged

bkamins closed this as completed Dec 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

column names of aggregated DataFrame with anonymous functions #1276

column names of aggregated DataFrame with anonymous functions #1276

pfarndt commented Nov 14, 2017

cjprybol commented Nov 14, 2017

nalimilan commented Nov 14, 2017

pfarndt commented Nov 14, 2017 •

edited

nalimilan commented Nov 14, 2017

bkamins commented Dec 14, 2018

column names of aggregated DataFrame with anonymous functions #1276

column names of aggregated DataFrame with anonymous functions #1276

Comments

pfarndt commented Nov 14, 2017

cjprybol commented Nov 14, 2017

nalimilan commented Nov 14, 2017

pfarndt commented Nov 14, 2017 • edited

nalimilan commented Nov 14, 2017

bkamins commented Dec 14, 2018

pfarndt commented Nov 14, 2017 •

edited