Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multi-index columns in aesthetics #194

Closed
Corone opened this issue Sep 19, 2018 · 1 comment
Closed

Allow multi-index columns in aesthetics #194

Corone opened this issue Sep 19, 2018 · 1 comment

Comments

@Corone
Copy link

Corone commented Sep 19, 2018

This doesn't have a parallel in R, but I think really affects usability in python with pandas.

While multilevel indices aren't tidy data, having multilevel column names doesn't break any tidy data rules, but we can't reference them. Further more they get created a lot in pandas as a result of groupby aggregation, and they are not trivial to get rid of. I don't know the internals, and since there is no R to copy, someone could choose the syntax that is easiest to implement; perhaps the most obvious would be to allow tuples in the aesthetic, since that is the equivalent accessor syntax in pandas.

Example:

df = pd.DataFrame(dict(A = ["A", "B", "C", "D"]*25, X = np.random.random(100), Y = np.random.random(100)))
print(df)
adf = df.groupby("A").agg(["mean", "median", "std"])
print(adf)

# Would be good to now be able to plot X:mean against Y:mean.  Something like
ggplot(adf, aes(x = ("X", "mean"), y = ("Y", "mean"))) + geom_point()

# The above doesn't work, instead you have to do something like this:

ggplot(adf, aes(x = adf[("X", "mean")], y = adf[("Y", "mean")])) + geom_point()```
@has2k1
Copy link
Owner

has2k1 commented Sep 19, 2018

Dealing with multilevel dataframes in plotnine would complicate the internals, there would be more to it than just treating a tuple as a multilevel column selector. If we try to collapse the levels for the user I think there will be edge cases that lead to wrong output . I have said something about multilevel indices elsewhere.

I use plydata for data manipulation, so I do not run into multilevel dataframes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants