How to plot random(group-based scores) effects? #637

fatihbozdag · 2023-02-04T01:05:51Z

Greetings all,

I've been looking for a way to plot ppc (probabilities) of categorical response per each variable included in the random effects. Indeed, I asked the question in Arviz github (arviz-devs/arviz#2188) however, I couldn't find a solution yet.

So basically, following the code below;

import arviz as az
import bambi as bmb
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import pymc as pm
import pymc.sampling_jax
import numpyro

df = pd.DataFrame({"Target": target_variable, "Random_Effects_Var": random_effects, "Column1": column1, "Column2": column2})

model = bmb.Model("Target~ Column1 + Column2 + (1|Random_Effects_Var)",
                  data = df, family = "categorical", auto_scale = True, priors = priors)

model.build()

with model.backend.model:
    idata=pm.sampling_jax.sample_numpyro_nuts(draws= 250, tune = 50, target_accept = .99,
                                              postprocessing_backend = 'cpu', idata_kwargs={"log_likelihood":True},chain_method='parallel')
    posterior_predictive = pm.sample_posterior_predictive(trace = idata, extend_inferencedata=True)

idata

I want to plot the probabilities of each item in the categorical response variable (3 level) per random effect(2 level). I did it before with rstanarm and easystats on R for binomial dist, yet got stuck here on Python.

This is from my own study.

The text was updated successfully, but these errors were encountered:

fatihbozdag · 2023-02-04T01:09:37Z

I found a similar plot here

_, ax = plt.subplots(figsize = (8, 8))

# Add vertical line for the global probability of hitting
ax.axvline(x=(df["H"] / df["AB"]).mean(), ls="--", color="black", alpha=0.5)

# Create forestplot with ArviZ, only for the mean.
az.plot_forest(
    [idata_non_hierarchical, idata_hierarchical],
    var_names="p(H, AB)_mean",
    combined=True,
    colors=["#666666", RED],
    linewidth=2.6,
    markersize=8,
    ax=ax
)

# Create custom y axis tick labels
ylabels = [f"H: {round(h)}, AB: {round(ab)}" for h, ab in zip(df["H"].values, df["AB"].values)]
ylabels = list(reversed(ylabels))

# Put the labels for the y axis in the mid of the original location of the tick marks.
ax.set_yticklabels(ylabels, ha="right")

# Create legend
handles = [
    Patch(label="Non-hierarchical", facecolor="#666666"),
    Patch(label="Hierarchical", facecolor=RED),
    Line2D([0], [0], label="Mean probability", ls="--", color="black", alpha=0.5)
]

legend = ax.legend(handles=handles, loc=4, fontsize=14, frameon=True, framealpha=0.8);

(https://bambinos.github.io/bambi/notebooks/hierarchical_binomial_bambi.html), comparing two data on the same variable (this is exactly what I want to do but with categorical response). However, the problem is it is almost impossible to apply the same method on data with more than 14000 obs!

tomicapretto · 2023-02-04T02:32:30Z

@fatihbozdag thanks for opening the issue!

Could you share a reproducible example (with some data) so I can better help you? The data of course does not need to be real. It can be either simulated or anonymized

fatihbozdag · 2023-02-04T12:41:18Z

Sure thing, Indeed, I've uploaded pickled inference data https://figshare.com/articles/software/results_pickle/21789347
But here is toy data with sample code.

x =  np.array(["Pred1"] * 40 + ["Pred2"] * 40 + ["Pred3"] * 70)
y =  np.array(["A"] * 50 + ["B"] * 50 + ["C"] * 50)
r = np.array(["R1"] * 90 + ["R2"] * 60)
data = pd.DataFrame({"y":y, "x": x, "r":r})
model = bmb.Model("y~ 0 + x + (1|r)", data, family = "categorical", auto_scale = True)

model.build()

model
Formula: y~ 0 + x + (1|r)
Family name: Categorical
Link: softmax
Observations: 150
Priors:
  Common-level effects
    x ~ Normal(mu: [0. 0. 0.], sigma: [5.6533 5.6533 5.0111])

  Group-level effects
    1|r ~ Normal(mu: 0, sigma: HalfNormal(sigma: 4.0329))

with model.backend.model:
    idata=pm.sampling_jax.sample_numpyro_nuts(draws= 250, tune = 50, target_accept = .99,
                                              postprocessing_backend = 'cpu', idata_kwargs={"log_likelihood":True},chain_method='parallel')
    posterior_predictive = pm.sample_posterior_predictive(trace = idata, extend_inferencedata=True)

idata

Inference data with groups:
	> posterior
	> posterior_predictive
	> log_likelihood
	> sample_stats
	> observed_data

idata.posterior

<xarray.Dataset>
Dimensions:        (chain: 4, draw: 250, x_dim: 3, y_dim: 2, r__factor_dim: 2)
Coordinates:
  * chain          (chain) int64 0 1 2 3
  * draw           (draw) int64 0 1 2 3 4 5 6 7 ... 243 244 245 246 247 248 249
  * x_dim          (x_dim) <U5 'Pred1' 'Pred2' 'Pred3'
  * y_dim          (y_dim) <U1 'B' 'C'
  * r__factor_dim  (r__factor_dim) <U2 'R1' 'R2'
Data variables:
    x              (chain, draw, x_dim, y_dim) float64 -7.286 -2.195 ... 7.844
    1|r_offset     (chain, draw, r__factor_dim, y_dim) float64 -0.1529 ... -0...
    1|r_sigma      (chain, draw) float64 10.41 4.648 5.216 ... 5.3 2.921 8.414
    1|r            (chain, draw, r__factor_dim, y_dim) float64 -1.591 ... -1.161
Attributes:
    created_at:     2023-02-04T12:42:15.801626
    arviz_version:  0.14.0

model.predict(idata) ###apply bambi predict to obtain mean###
<xarray.Dataset>
Dimensions:        (chain: 4, draw: 250, x_dim: 3, y_dim: 2, r__factor_dim: 2,
                    y_mean_dim: 3, y_obs: 150)
Coordinates:
  * chain          (chain) int64 0 1 2 3
  * draw           (draw) int64 0 1 2 3 4 5 6 7 ... 243 244 245 246 247 248 249
  * x_dim          (x_dim) <U5 'Pred1' 'Pred2' 'Pred3'
  * y_dim          (y_dim) <U1 'B' 'C'
  * r__factor_dim  (r__factor_dim) <U2 'R1' 'R2'
  * y_mean_dim     (y_mean_dim) <U1 'A' 'B' 'C'
  * y_obs          (y_obs) int64 0 1 2 3 4 5 6 7 ... 143 144 145 146 147 148 149
Data variables:
    x              (chain, draw, x_dim, y_dim) float64 -7.286 -2.195 ... 7.844
    1|r_offset     (chain, draw, r__factor_dim, y_dim) float64 -0.1529 ... -0...
    1|r_sigma      (chain, draw) float64 10.41 4.648 5.216 ... 5.3 2.921 8.414
    1|r            (chain, draw, r__factor_dim, y_dim) float64 -1.591 ... -1.161
    y_mean         (chain, draw, y_obs, y_mean_dim) float64 0.9968 ... 0.7472
Attributes:
    created_at:     2023-02-04T12:42:15.801626
    arviz_version:  0.14.0

tomicapretto · 2023-02-06T12:09:32Z

I'm not sure I understand exactly what you want to do

Are you trying to reproduce this figure?
https://user-images.githubusercontent.com/59308858/216736802-9b1f1f97-80f9-4696-976d-427f8bad4c54.png
Although that one has two categories and your response contains three.

"plot ppc (probabilities) of categorical response per each variable included in the random effects"

You could create a plot where on the y-axis there's the probability of the outcome, on the horizontal axis the levels of a grouping variable used in the "random effects" part, and shows multiple estimates with credible intervals (one per response level). Is that what you want? Just take into account that you'll need to pick a value for any other predictor your model may have.

fatihbozdag · 2023-02-07T21:50:30Z

yeap, that's exactly what I want to do, plotting probabilities of A, B, and C given R1 and R2.

P.S. sorry, I cannot focus on the topic and write more details as a massive earthquake struck my hometown, Adana/Turkey. Can you please gimme a few days? Currently, I work as a freelance translator to guide Rescue Teams.

fatihbozdag · 2023-02-09T18:13:07Z

I'm not sure I understand exactly what you want to do

Are you trying to reproduce this figure? https://user-images.githubusercontent.com/59308858/216736802-9b1f1f97-80f9-4696-976d-427f8bad4c54.png Although that one has two categories and your response contains three.

"plot ppc (probabilities) of categorical response per each variable included in the random effects"

You could create a plot where on the y-axis there's the probability of the outcome, on the horizontal axis the levels of a grouping variable used in the "random effects" part, and shows multiple estimates with credible intervals (one per response level). Is that what you want? Just take into account that you'll need to pick a value for any other predictor your model may have.

Hi again. I am back to ordinary life(sort of). How can I produce such a plot? Yeap, this is categorical regression consisting of response with 3 levels.

fatihbozdag · 2023-02-09T21:00:10Z

Maybe I should add more info. Indeed the problem is I am unable to include "response reference level" in the plot. Or else, Arviz az.plot_forest works without reference level.

fatihbozdag · 2023-02-09T21:38:33Z

update for another unsuccessful attempt;

following model.predict(idata), values including reference level are added to inference data as y.mean_dim . However, arviz coords does not accept new variable as a valid one.

coords = {"y_dim": ["B","C"]} ## works but not reference level, assuming "A" is the reference.

az.summary(idata, var_names =x_dim, coords = coords)

coords = {"y_mean_dim": ["B","C"]} ## does not work, but idata.posterior.y_mean_dim actually includes the reference level yet

az.summary(idata, var_names = x_dim, coords = coords) says

KeyError: 'Coords should follow mapping format {coord_name:[dim1, dim2]}. Check that coords structure is correct and dimensions are valid. "\'y_mean_dim\' is not a valid dimension or coordinate"'

tomicapretto · 2023-02-16T12:40:12Z

@fatihbozdag I'm sorry you were affected by the earthquake. I hope you're doing better now!

What I don't understand from this visualization https://user-images.githubusercontent.com/59308858/216736802-9b1f1f97-80f9-4696-976d-427f8bad4c54.png is what the line represents. I guess the line is a probability going from 0 to 1, and then the closer it's to zero, the closer it's to DO, and the closer it's to one, the closer it's to PD. Is that correct?

But in this case, you have three response levels, so you can't get a plot like that one.

Also, you have other predictors on top of the group-level effect you're interested in (r in the toy example). How did you handle these other predictors in the visualization above? Did you set them to an equal value, or did you marginalize over their possible range of values?

fatihbozdag · 2023-02-16T20:08:42Z

Thank you, @tomicapretto,

well, doing much better but the whole country is struggling and is likely to do so for a while. As of writing the post, we've just had another one (5.1) I guess, we have to get used to it. Anyways.

Your interpretation is correct. Other predictors are kept at the response level since all predictors are categorical variables.

to clarify,

model.predict(idata, kind = "pps", include_group_specific=True)
ax = az.plot_ppc(idata)
ax.set_xticks([0.5, 1.5, 2.5])
ax.set_xticklabels(model.response.term.levels)

provides such a plot. So what if I would like to get such plot per random effects?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to plot random(group-based scores) effects? #637

How to plot random(group-based scores) effects? #637

fatihbozdag commented Feb 4, 2023

fatihbozdag commented Feb 4, 2023 •

edited

tomicapretto commented Feb 4, 2023

fatihbozdag commented Feb 4, 2023 •

edited

tomicapretto commented Feb 6, 2023

fatihbozdag commented Feb 7, 2023

fatihbozdag commented Feb 9, 2023 •

edited

fatihbozdag commented Feb 9, 2023

fatihbozdag commented Feb 9, 2023 •

edited

tomicapretto commented Feb 16, 2023

fatihbozdag commented Feb 16, 2023

How to plot random(group-based scores) effects? #637

How to plot random(group-based scores) effects? #637

Comments

fatihbozdag commented Feb 4, 2023

fatihbozdag commented Feb 4, 2023 • edited

tomicapretto commented Feb 4, 2023

fatihbozdag commented Feb 4, 2023 • edited

tomicapretto commented Feb 6, 2023

fatihbozdag commented Feb 7, 2023

fatihbozdag commented Feb 9, 2023 • edited

fatihbozdag commented Feb 9, 2023

fatihbozdag commented Feb 9, 2023 • edited

tomicapretto commented Feb 16, 2023

fatihbozdag commented Feb 16, 2023

fatihbozdag commented Feb 4, 2023 •

edited

fatihbozdag commented Feb 4, 2023 •

edited

fatihbozdag commented Feb 9, 2023 •

edited

fatihbozdag commented Feb 9, 2023 •

edited