Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Pairs in stack #3423

Open
pdeffebach opened this issue Jan 26, 2024 · 2 comments
Open

Feature request: Pairs in stack #3423

pdeffebach opened this issue Jan 26, 2024 · 2 comments

Comments

@pdeffebach
Copy link
Contributor

This is a small feature that might save me some typing (that is, feel free to ignore)

stack is a nice function. But I'm finding that frequently, I want to stack right before making a table or a plot. And variable names aren't very conducive to pretty plots and tables.

julia> d = DataFrame(a_1 = [1, 2], a_2 = [3, 4]);

julia> stack(d, [:a_1, :a_2])
4×2 DataFrame
 Row │ variable  value 
     │ String    Int64 
─────┼─────────────────
   1 │ a_1           1
   2 │ a_1           2
   3 │ a_2           3
   4 │ a_2           4

I end up having to do some sort of re-labeling after I stack, something clunky like as follows

    t = @chain df begin
        @rsubset :has_job == 0
        @rsubset !ismissing(:rej_value)
        stack(
            [:nojob_value,
            :rej_value_same_earn_nojob,
            :rej_value_same_cond_nojob,
            :rej_value_same_commute_nojob,
            :rej_value],
            [:jobseeker_id];
            variable_name = :type,
            value_name = :val)
        dropmissing
        @subset :val .> quantile(:val, .05) .&& :val .< quantile(:val, .95)
        @aside d = Dict(
            "nojob_value" => "No job value",
            "rej_value" => "Rejected job",
            "rej_value_same_earn" => "Rejected value: earnings same",
            "rej_value_same_cond" => "Rejected value: conditions same",
            "rej_value_same_commute" => "Rejected value: commute same"
            )
        @rtransform :type = d[:type]
    end

It would be cool to have the labels be given as a Pair during the stack phase

stack(d, [:a_1 => "A 1", :a_2 => "A 2"])

to produce

4×2 DataFrame
 Row │ variable  value 
     │ String    Int64 
─────┼─────────────────
   1 │ A 1           1
   2 │ A 1           2
   3 │ A 2           3
   4 │ A 2           4
@ericphanson
Copy link
Contributor

I don't know dataframes meta but that looks quite complicated; what about just:

vars = [:a_1 => "A 1", :a_2 => "A 2"]
stack(rename(d, vars), last.(vars))

@pdeffebach
Copy link
Contributor Author

That's harder to write in interactive settings. If you are at the REPL, you would have to know in advance what variables you are stacking on before you start typing the stack command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants