Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Groups and stacked/dodged positions. #608

Open
AlFontal opened this issue Aug 5, 2022 · 1 comment
Open

Groups and stacked/dodged positions. #608

AlFontal opened this issue Aug 5, 2022 · 1 comment

Comments

@AlFontal
Copy link

AlFontal commented Aug 5, 2022

I'm dealing with a figure where I would like to be able to dodge columns based on the group aesthetic while having the columns stay stacked on the fill aesthetic.

My intuition was that for geometries that use the group aesthetic, color/fill is assumed to be used as the group when it is not explicitly mapped to a variable. This seems to be the case with geom_line, for instance.

However, when using position_dodge and an explicitly mapped group, the bars appear to still be dodged by the group + fill variable.

To show some examples with a toy dataset:

df = pd.DataFrame(dict(
    day=np.repeat(range(1, 5), 3),
    species=np.tile(['A', 'B', 'C'], 4),
    counts=np.random.randint(0, 200, 12),
    replicate=1
))

df_replicates = df.query('day==2').assign(replicate=2).assign(counts=np.random.randint(0, 200, 3))

full_df = pd.concat([df, df_replicates])
day species counts replicate
1 A 49 1
1 B 92 1
1 C 195 1
2 A 110 1
2 B 150 1
2 C 32 1
3 A 9 1
3 B 146 1
3 C 195 1
4 A 161 1
4 B 190 1
4 C 100 1
2 A 88 2
2 B 87 2
2 C 32 2

In this case, we have two different samples/replicates in day 2.

Default Behaviour

(p9.ggplot(full_df)
 + p9.aes('day', 'counts', fill='species', group='replicate')
 + p9.geom_col()
)

image

Since the default position is stack, specifying the group as the replicate stacks the bars for each of the replicates of day 2 following the order of species-replicate: A-1, B-1, C-1, A-2, B-2, C-2. Fair enough, this is probably the expected and desired behaviour.

Dodged Behaviour

The issue arises when we attempt to dodge the groups, so that the height of the bars still represents the total counts in a single sample:

position_dodge

(p9.ggplot(full_df)
 + p9.aes('day', 'counts', fill='species', group='replicate')
 + p9.geom_col(position='dodge')
)

image

The group (replicates) have now been properly dodged/unstacked in day 2, but the previously stacked species all start now in the same x-y coords of the day-group instead of being stacked. Lowering the alpha a tad helps to see it better:

image

position_dodge2

The results with position_dodge2 are also problematic, since the width of the bars doesn't seem to be properly calculated (day 1 values become half as wide, while values for day 2 stay the same and overlap with values from the other days).

In any case, this completely unstacks the species/fill aesthetic, which is something we don't want even if bars' width was properly calculated.

The hacky solution

I thought that manually changing the x values (works in this case but would be a bit harder if the x variable was a Categorical, for instance) and generating a width variable to pass as an aesthetic manually setting the width of the day 2 values to .25 should allow me to make this work:

(full_df
 .assign(day=lambda dd: np.where(dd.replicate == 2, dd.day + .22, dd.day))
 .assign(day=lambda dd: np.where(dd.day==2, dd.day - .22, dd.day))
 .assign(width=lambda dd: np.where(dd.day.round() == 2, .4, .8))
 .pipe(lambda dd: p9.ggplot(dd)
 + p9.aes('day', 'counts', fill='species', group='replicate')
 + p9.geom_col(position='stack', mapping=p9.aes(width='width'))
)

image

I actually managed to solve this while writing the issue with this hacky solution, but still, it feels like there might be some way to include this behaviour in some position variety? I don't know if this situation is common enough to justify generating a specific position_dodgestack or something like that, but what do you think?

@has2k1
Copy link
Owner

has2k1 commented Aug 15, 2022

Yes this is a dodgestack or stackdodge situation. Though I do not think it would be common enough to warrant an implementation. Plus the resulting graphic has missing information! There is no guide for the two different split bars at location 2. You need to know about the replication to understand what is shown!

A more straight forward alternative to create new grouping variable in the data, with the appropriate x labels.

In most cases when you find yourself fighting with the plotting grammar, then the data has not been coded well. I think that is the case here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants