The KDE transform creates values where there are none when used with `{"resolve": "shared"}` #3815

joelostblom · 2023-10-28T17:56:59Z

If {"resolve": "shared"} is set, the extent of grouped density transforms incorrectly use the min/max of the entire dataset instead of for each group, resulting in long lines where there are no observations at all, instead of stopping the density at the last data point in the group. I noticed this in Vega-Lite, but wonder if it could be fixed directly in the KDE transform in Vega instead of doing some post-processing such as dropping zeros in Vega-Lite. I understand that the computation need to happen over the same domain to enable stacking, but would it be possible to trim the densities after that to only include values that exists within each group? This would also be helpful for the violinplot implementation.

This chart is created in altair 5.1.2 which uses VL 5.15.1 and shows the undesired behavior:

Open the Chart in the Vega Editor

The desired behavior would look like this where each density is cut at the min/max values of each group:

Altair code

import altair as alt
from vega_datasets import data

source = data.iris.url

alt.Chart(source, height=100).transform_density(
    'petalWidth',
    groupby=['species']
).mark_area(stroke='black').encode(
    alt.X('value:Q'),
    alt.Y('density:Q').stack(False),
    alt.Facet('species:N', columns=1, title=None).header(labelFontWeight='bold', labelFontSize=12)
)

Ref vega/vega-lite#9078

The text was updated successfully, but these errors were encountered:

joelostblom · 2023-11-10T05:34:04Z

@jheer Do you think this is something that is suitable for implementation on the Vega side of things or does it belong in the Vega-Lite repo?

A related issue stemming from this is that setting the x-scaled to "independent" does not have the intended effect. Take for example this chart where I would like the axis to be adjusted in each subplot to only span the range of the data, so that I can see both distributions clearly:

Open the Chart in the Vega Editor

mattijn · 2023-11-10T08:15:26Z

If I add a "y":"independent" to the scale-resolver in the VL-spec and in the Vega-spec remove the impute transform and make the kde transform to resolve independent it seems like what you are after:

Open the Chart in the Vega Editor

In my opinion this is something for the VL-repository.

joelostblom · 2023-11-11T17:37:06Z

Thanks @mattijn, setting the transform resolve to independent would fix both the specs above, but it would lead to jagged appearance when having two densities in the same chart as described in this issue vega/vega-lite#9078. So we would either need another way to fix that (maybe setting the steps + the extent?) so that we can use "resolve": "independent", or make the shared resolve more flexible so that it works with the examples in this issue. I'm happy with whichever solution is the easiest to implement and support these use cases.

joelostblom · 2023-11-11T19:58:56Z

After investigating this further, I can give a more comprehensive explanation of what is going on. Here is a single spec that contains both issue. I can't find any combination of parameters that supports each density ending at the min/max value of the data AND being able to have the two grouped/colored densities display properly on top of each other.

Step 1: Coloring by one variable and faceting by another. You can see how the lower facet ("Open") is extended all the way to the x-axis min around 3.5, although there are not data points there. (Also note that by default Vega-Lite now stacks areas which is not ideal for distribution densities since it makes them harder to compare, but this is a separate issue vega/vega-lite#9170).

Open the Chart in the Vega Editor

Step 2: I can fix the issue with the extension to zero if I set the resolve to independent and remove the impute transform as you suggested. However, that automatically unstacks the areas (which often is a good default but would be unexpected to someone who explicitly specified a stacked density:

Open the Chart in the Vega Editor

In other words, I can't find a combination of parameters that allows me to create this chart (correctly stacked on top, and correct extent on the bottom):

domoritz · 2023-12-18T18:18:06Z

It looks like the first case can be resolved with explicitly setting the kde resolve which we add in vega/vega-lite#9172

#3815 (comment) is a bit trickier but could be addressed with a clip property that removes density values outside the original data domain per group. This could be a useful feature anyway (for both shared and independent density computation.

domoritz · 2024-03-15T15:43:38Z

I'm working on this now

joelostblom added the bug For bugs or other software errors label Oct 28, 2023

joelostblom mentioned this issue Nov 10, 2023

Grouped densities are stacked by default instead of sharing the same baseline vega/vega-lite#9170

Open

joelostblom mentioned this issue Nov 11, 2023

feat: add explicit option to control how densities are resolved, change how densities are resolved by default vega/vega-lite#9172

Merged

domoritz added enhancement For enhancement of existing features and removed bug For bugs or other software errors labels Dec 18, 2023

domoritz self-assigned this Dec 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The KDE transform creates values where there are none when used with `{"resolve": "shared"}` #3815

The KDE transform creates values where there are none when used with `{"resolve": "shared"}` #3815

joelostblom commented Oct 28, 2023 •

edited

joelostblom commented Nov 10, 2023

mattijn commented Nov 10, 2023 •

edited

joelostblom commented Nov 11, 2023

joelostblom commented Nov 11, 2023

domoritz commented Dec 18, 2023

domoritz commented Mar 15, 2024

The KDE transform creates values where there are none when used with {"resolve": "shared"} #3815

The KDE transform creates values where there are none when used with {"resolve": "shared"} #3815

Comments

joelostblom commented Oct 28, 2023 • edited

joelostblom commented Nov 10, 2023

mattijn commented Nov 10, 2023 • edited

joelostblom commented Nov 11, 2023

joelostblom commented Nov 11, 2023

domoritz commented Dec 18, 2023

domoritz commented Mar 15, 2024

The KDE transform creates values where there are none when used with `{"resolve": "shared"}` #3815

The KDE transform creates values where there are none when used with `{"resolve": "shared"}` #3815

joelostblom commented Oct 28, 2023 •

edited

mattijn commented Nov 10, 2023 •

edited