Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geom_bar()/geom_col() erroneously warn that they ignore width aesthetic #3142

Open
richierocks opened this issue Feb 13, 2019 · 15 comments · May be fixed by #5807
Open

geom_bar()/geom_col() erroneously warn that they ignore width aesthetic #3142

richierocks opened this issue Feb 13, 2019 · 15 comments · May be fixed by #5807

Comments

@richierocks
Copy link
Contributor

geom_bar() and geom_col() let you specify a width aesthetic to control the width of the bars.

The behavior is as expected, but it generates an erroneous warning "Warning: Ignoring unknown aesthetics: width".

width isn't listed in the aesthetics section of ?geom_bar, so it appears that this is an unofficial behavior.


library(ggplot2)
suppressPackageStartupMessages(library(dplyr))
mtcars_by_cyl <- mtcars %>% 
  group_by(cyl) %>% 
  summarize(
    mean_wt = mean(wt),
    n = n()
  ) %>% 
  mutate(prop = n / sum(n))

ggplot(mtcars_by_cyl) + 
  geom_col(aes(cyl, mean_wt, width = prop))
#> Warning: Ignoring unknown aesthetics: width

Created on 2019-02-13 by the reprex package (v0.2.0).


Similar closed issues.

@richierocks richierocks changed the title geom_bar()/geom_col() erroneuously warn that they ignore width aesthetic geom_bar()/geom_col() erroneously warn that they ignore width aesthetic Feb 13, 2019
@yutannihilation
Copy link
Member

yutannihilation commented Feb 14, 2019

Currently, width is recognized as a parameter by a "hack". Here's the comment written 4 years ago. Maybe it's worth trying to make width to a proper aes?

# Hack to ensure that width is detected as a parameter

@ptoche
Copy link

ptoche commented Feb 17, 2019

elsewhere (e.g. boxplot), width is added to the list of extra_params:

# need to declare `width`` here in case this geom is used with a stat that

@clauswilke
Copy link
Member

width works just fine as a parameter in the way the code is currently written, and the "hack" is fine also. The question is whether width should be an aesthetic. I'm skeptical, because bars with varying widths are not normally meaningful. It's not that different a case from bars that start from a base value other than zero, which we also don't support. If people really want to do something like this, they can use geom_rect() or geom_tile() instead.

@yutannihilation
Copy link
Member

The question is whether width should be an aesthetic

Isn't width already an aes? At least, the plot above seems to have varying widths of bars.

@yutannihilation
Copy link
Member

Sorry, I was confused. Now I come to think the varying widths of geom_col() is just a mistake. It uses data$width, but it should be really "ignored" as the warning says.

ggplot2/R/geom-col.r

Lines 40 to 46 in 43dcd63

setup_data = function(data, params) {
data$width <- data$width %||%
params$width %||% (resolution(data$x, FALSE) * 0.9)
transform(data,
ymin = pmin(y, 0), ymax = pmax(y, 0),
xmin = x - width / 2, xmax = x + width / 2, width = NULL
)

In geom_bar()'s case, stat_count() provides the width, so it should be used. But, geom_col() uses stat_identity(), which we should not expect width.

@yutannihilation
Copy link
Member

But, in terms of the interface (I don't mean the current behaviour is semantically correct), width is provided by a Stat via data. So, it is virtually an aes.

I'm wondering why width is not passed via param...

@yutannihilation
Copy link
Member

Oh, this last example reminds me of the need for varying width.

# You can specify a function for calculating binwidth,
# particularly useful when faceting along variables with
# different ranges

https://ggplot2.tidyverse.org/reference/geom_histogram.html

@yutannihilation
Copy link
Member

Here's my understanding. Is this correct?

  • We want to enforce a constant bar width within a panel, so width cannot be an aes.
  • Yet, the width can vary among panels, so we need to pass widths per bar via data, not a single value via param.
  • data$width should be used only when the Stat provides it. But, geom_col() is not the case, it should ignore data$width.

@ptoche
Copy link

ptoche commented Feb 18, 2019

Just a comment in passing: If width were to be passed as an aes to capture the relative amounts of some variable in the dataset, the bar-chart would become a sort of rectangular-shaped pie-chart, where the area --- not the length --- becomes the relevant metric (I don't think that's "meaningless", but would suffer from most of the problems that pie-charts have). As far as I can tell, Hadley (among others) is not fond of pie-charts.

For the standard bar-chart with "meaningless" width, I would argue that the current default width of geom_bar is too wide : narrower bars would help the eye focus on the important metric --- height. Excel and LibreOffice/Calc seem to go for a default of 100%, i.e. the space between bars = width of the bars. geom_bar is wider than that. Anyone else thinks it ought to be narrower?

library("reprex")

library("ggplot2")
ggplot(mtcars, aes(x = gear)) + geom_bar()

ggplot(mtcars, aes(x = gear)) + geom_bar(width = 0.5)

ggplot(mtcars, aes(x = gear)) + geom_bar(width = 0.25)

Created on 2019-02-18 by the reprex package (v0.2.1)

@mattansb
Copy link

Just to comment here that allowing width as an aesthetic can be used to have different "sized" pies in pie charts, which is quite useful (I mean, as useful as a pie chart can be...):

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)

d <- mtcars %>% 
  group_by(am) %>% 
  count(cyl) %>% 
  mutate(total = sum(n),
         norm_n = n / total)

(p <- ggplot(d, aes(0, norm_n, fill = factor(cyl))) + 
    facet_grid(cols = vars(am)) + 
    geom_col(aes(width = total), position = position_stack()))
#> Warning: Ignoring unknown aesthetics: width

p + aes(x = total/2) + coord_polar("y")

Created on 2021-08-12 by the reprex package (v2.0.0)

@oloverm
Copy link

oloverm commented Nov 30, 2023

Variable width is useful for bar charts by month, to prevent the bars from overlapping. Especially if you want no gaps between the bars, but also because you'll get inconsistent gaps otherwise.

You can hack it by making the dates a factor instead, but then you need to do much more work to get a nice date axis.

library(dplyr, warn.conflicts = FALSE)
library(lubridate, warn.conflicts = FALSE)
library(ggplot2, warn.conflicts = FALSE)

set.seed(1)

df <- tibble(
  date = seq.Date(ymd("2020-01-01"), ymd("2020-12-01"), by = "1 month"),
  quantity = sample(20:100, 12),
  ndays = days_in_month(date)  # width for different months
) |> 
  mutate(date = date + ndays / 2)  # reposition to fix the overlaps

df
#> # A tibble: 12 × 3
#>    date       quantity ndays
#>    <date>        <int> <int>
#>  1 2020-01-16       87    31
#>  2 2020-02-15       58    29
#>  3 2020-03-16       20    31
#>  4 2020-04-16       53    30
#>  5 2020-05-16       62    31
#>  6 2020-06-16       33    30
#>  7 2020-07-16       78    31
#>  8 2020-08-16       70    31
#>  9 2020-09-16       40    30
#> 10 2020-10-16       73    31
#> 11 2020-11-16       26    30
#> 12 2020-12-16       56    31

df |> 
  ggplot() +
  geom_col(aes(date, quantity, width = ndays), alpha = 0.7)
#> Warning in geom_col(aes(date, quantity, width = ndays), alpha = 0.7): Ignoring
#> unknown aesthetics: width

Created on 2023-11-30 with reprex v2.0.2

@teunbrand
Copy link
Collaborator

teunbrand commented Mar 26, 2024

I understand that we do not want to encourage bar charts with variable widths, however I do think enforcing this is causing us more pain than gain. I'd like to challenge some points in favour of not recognising width as an aesthetic.

width works just fine as a parameter in the way the code is currently written,

Not really. It throws warnings about being ignored, while it is being used.

the "hack" is fine also

While the hack works to recognise the parameter, we wouldn't need the hack at all if it were a proper aesthetic.

If people really want to do something like this, they can use geom_rect() or geom_tile() instead.

  • geom_tile() is not a good alternative, for two reasons. The height aesthetic is not a position aesthetic, so it does not respond to scale transformations. Scale-transformed bar charts are probably a bad idea anyway, but I don't think we should prohibit it. Secondly, you have to use y = after_stat(count / 2) when pairing a bar chart with a stat, which is clunky.
  • geom_rect() is not a good alternative, also for two reasons. You have to specify ymin = 0, which is clunky. More importantly, when using a discrete x variable, the xmin and xmax are a pain to compute, because you'd have to manually convert the discrete variable into a continuous one.
  • If you want to solve most of these issues, you'd want a geom that has x/width parametrisation for the horizontal direction, but ymin/ymax parametrisation for the vertical direction. This geom does not exist.

data$width should be used only when the Stat provides it. But, geom_col() is not the case, it should ignore data$width.

Ideally, the geom shouldn't care whence the width data came. Baking in prohibitions for specific geom/stat pairings hurts the flexibility of the API and should, in my opion, only ever be used to enhance displays, not prohibit them.

I'd also like to re-iterate some points in favour of width as aesthetic.

  • We already allow bars with varying width directly from the aesthetics. Sure, we throw a warning in protest, but then promptly display the bars as people intended anyway. We can even circumvent this warning by using ggplot(..., mapping = aes(..., width = var)) as it'll end up in the layer data even for layers that don't have width as an aesthetic or parameter.
  • There are valid use-cases from a user perspective, as pointed out elsewhere in this issue.
  • There are valid use-cases from a developer perspective, such as when width comes from a position adjustment, stat computation, or needs to vary between panels.
  • Maintaining width as a proper aesthetic is easier than relying on the current hack.

In summary, the main argument against width as an aesthetic is that it might possibly encourage some bad visualisation. However, we can't stop people from doing this anyway and having ggplot2 jump through hoops to discourage this is causing discomfort in the shape of hacks and spurious warnings. Therefore, I argue we should just let width be an aesthetic.

@teunbrand teunbrand linked a pull request Mar 26, 2024 that will close this issue
@clauswilke
Copy link
Member

@teunbrand Let me go back on my argument from six years ago. While I still think one has to be careful with variable widths in a plot, I also these days believe plotting software should be maximally flexible and not impose specific design philosophies on their users. So unless there's a good technical reason not to have width as an aesthetic I don't see how we lose in any way by making it one.

@teunbrand
Copy link
Collaborator

Thanks Claus, it seems we are in alignment then over this. I didn't mean to single out your arguments (and I'm sorry if it appeared that way). I just felt that this issue was stuck in a weird place of being acknowledged and having proposed solutions, but being dormant for a while. My arguing hopefully would get folks on board with the 'width as aesthetic' approach, so we can move forward on this issue.

@clauswilke
Copy link
Member

No worries, I didn't feel singled out. In fact, I was surprised by my own comment from 2019 as today I don't think I would write it. (I came here thinking: let me argue in favor of width as an aesthetic and let's see who the idiot was that argued against it. Well, it was me apparently. 🤣)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants