Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colouring the groups is not consistent if data for grouping level is missing #445

Open
GregorDall opened this issue Jul 13, 2022 · 7 comments

Comments

@GregorDall
Copy link

I am having issue with ggpairs, if one value for a grouping levels is missing, the colours are not consistent. Is there a way to circumvent this?
Here is an example:

data(iris)
iris[iris$Species=="setosa","Petal.Length"] <- NA
ggpairs(iris, mapping = aes(color = Species))
@GregorDall
Copy link
Author

grafik

@schloerke
Copy link
Member

It's been years since we've run across a fundamental issue like this one. 😱 Thank you for starting the conversation!


Ok. So let's back up and see what's happening with ggplot2...

If we look at very simplified code of one of the panels in question, we get code similar to:

ggplot(iris, aes(Petal.Length, fill = Species)) + geom_histogram()
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> Warning message:
#> Removed 50 rows containing non-finite values (stat_bin).

Screen Shot 2022-07-13 at 9 04 43 PM

Rightfully so, there should be three colors in the legend, even if there is no data being plotted... as the levels of the data say that a color choice should exist.

But, we have the warning message of Removed 50 rows containing non-finite values (stat_bin).. This is what is throwing a wrench into the system.


Maybe related: tidyverse/ggplot2#4567


Do you know how we could write a standard ggplot2 code that would have the correct legend? If so, then we can adjust the ggally_*() methods.

@GregorDall
Copy link
Author

I found a way of manually setting group colors but it is a bit messy:
https://stackoverflow.com/questions/65621919/r-assigning-colors-manually-in-ggplot

`
mycolors <- setNames(c("#999999", "#E69F00", "#56B4E9"), c("setosa","versicolor","virginica"))

ggplot(iris, aes(Petal.Length, fill = Species, )) + geom_histogram() + scale_fill_manual(values=mycolors)
`
grafik

but this is not 100% transferrable to ggpairs, and it does not get rid of the error message. The upper diagonal panel text is not coloured by groups any more.

ggpairs(iris, legend = 5, mapping = aes(color = Species)) + scale_fill_manual(values=mycolors) + scale_color_manual(values=mycolors)
grafik

@GregorDall
Copy link
Author

In ggplot2, adding scale_fill_discrete(drop = FALSE) seems to do the trick. See: tidyverse/ggplot2#4908
This translates partly to GGally::ggpairs, however correlation panels are unaffected. Any ideas?

ggplot(iris, aes(Petal.Length, fill = Species)) + geom_histogram() + scale_fill_discrete(drop = FALSE)
grafik

ggpairs(iris, mapping = aes(color = Species)) +
	scale_fill_discrete(drop=FALSE) 

grafik

@schloerke
Copy link
Member

I'll have to work on the correlation plots. There's a lot of data transformations beforehand that make leveraging the original data harder, but not impossible!


TODO:

  • Add scale_fill_discrete(drop = FALSE) to all fill ggally_*() methods
  • Update ggally_statistic() to support mapping color to the final data

@GregorDall
Copy link
Author

Thank you for your efforts!

@twest820
Copy link

Something to note is the group correlations remain colored here with scale_color_discrete() but turn grey with scale_color_manual(). I just encountered this in the case where drop = FALSE is not needed and would suggest it be included in @schloerke's TODO list if doesn't fall under the ggally_statistic() item.

Using

upper = list(continuous = wrap("cor", color = "black"))

with scale_color_manual() provides a partial workaround in that you can pick one other color besides grey but, since upper currently requires aesthetics be of length 1, it's not possible to restore the color flow by groups via upper.

Another workaround, which seems likely to be fragile, is to hijack the color map so discrete color works like manual color,

scale_color_discrete(type = c(<list of color names>...))

in which case colors do flow to group correlation text, provided no color is specified in upper. If it doesn't break, this approach is attractive if one's ok with the default grey being used for ungrouped correlations but it does highlight 1) the apparent lack of a mechanism for users to access color levels created by ggpairs() and 2) curious splits in handling of discrete versus manual color.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants