Estimate Discrepancy Between 2 Complementary Models #211

jjohnston123 · 2022-02-07T22:29:18Z

jjohnston123
Feb 7, 2022

I am running into a bit of an issue and am unsure whether it is due to a lack of background knowledge. I have data on the presence of lesions of 2 types (inflammatory or non-inflammatory) in 3 different locations. I have run 2 different logistic mixed models:

Example 1: Lesion presence is the response variable with predictors of location and lesion type with a random effect of subject ID.

Example 1.csv

example1<-read.csv("Example 1.csv",header=T)
as.factor(example1$ID)->example1$ID
as.factor(example1$Location)->example1$Location
as.factor(example1$LesionType)->example1$LesionType
as.factor(example1$LesionPresence)->example1$LesionPresence

model.example1<-glmmTMB(LesionPresence ~ Location*LesionType + (1 | ID), data=example1, family="binomial")

ggpredict.example1<-ggpredict(model.example1, ~Location*LesionType)

# LesionType = 0

Location | Predicted |       95% CI
-----------------------------------
0        |      0.84 | [0.73, 0.91]
1        |      0.71 | [0.59, 0.81]
2        |      0.75 | [0.62, 0.85]

# LesionType = 1

Location | Predicted |       95% CI
-----------------------------------
0        |      0.60 | [0.48, 0.71]
1        |      0.67 | [0.55, 0.77]
2        |      0.65 | [0.52, 0.77]

Adjusted for:
* ID = NA (population-level)

Example 2: Treating inflammatory vs non-inflammatory as a mutually exclusive binary categories with location as a predictor and a random effect of subject ID.

Example 2.csv

example2<-read.csv("Example 2.csv",header=T)
as.factor(example2$ID)->example2$ID
as.factor(example2$Location)->example2$Location
as.factor(example2$Inflammatory)->example2$Inflammatory

model.example2<-glmmTMB(Inflammatory ~ Location + (1 | ID), data=example2, family="binomial")

ggpredict.example2<-ggpredict(model.example2, ~Location)

# Predicted probabilities of Inflammatory

Location | Predicted |       95% CI
-----------------------------------
0        |      0.89 | [0.75, 0.96]
1        |      0.77 | [0.60, 0.88]
2        |      0.80 | [0.63, 0.91]

Adjusted for:
* ID = NA (population-level)

The csv data inputs (attached) for example 1 and example 2 have the same arithmetic means for inflammatory lesions in the 3 different locations. However, the emmeans outputs for example 1 but not example 2 equals these arithmetic means. I was expecting both models to produce the same estimates. I've also noticed that if I remove the random effect for example 1, nothing changes. However, if I remove the random effect for example 2, then the estimated means for inflammatory lesions across both examples are equal.

Any insight into where I may be going wrong?

strengejacke · 2022-02-08T07:17:19Z

strengejacke
Feb 8, 2022
Maintainer

I guess the main reason is the larger variation of your outcome across IDs in the second model. Take a look at the ICCs of both models (see https://easystats.github.io/performance/reference/icc.html), and you will see that the first model almost suffers from singularity (see https://easystats.github.io/performance/reference/check_singularity.html - that's why you almost get identical results when removing the random effects), while the second does not. Thus, shrinkage (or "regularization") is more extreme in the second model, thereby a stronger adjustment of estimates.

To answer your last question: you're actually doing nothing wrong. The variation of the outcome across IDs in your second model is just larger.

0 replies

strengejacke · 2022-02-08T07:28:29Z

strengejacke
Feb 8, 2022
Maintainer

Looking at the raw data might help understanding the issue. The shapes in the left figure look more similar than those in the right.

library(glmmTMB)
library(patchwork)
library(ggplot2)

example1<-read.csv("~/../Downloads/Example.1.csv",header=T)
colnames(example1)[1] <- "ID"
as.numeric(example1$ID)->example1$ID
as.factor(example1$LesionPresence)->example1$LesionPresence
example1 <- example1[!is.na(example1$LesionPresence), ]

p1 <- ggplot(example1, aes(x = ID, y = LesionPresence)) +
  geom_violin() +
  geom_jitter(width = .2, height = .1) +
  labs(title = "Little variance")

example2<-read.csv("~/../Downloads/Example.2.csv",header=T)
colnames(example2)[1] <- "ID"
as.numeric(example2$ID)->example2$ID
as.factor(example2$Inflammatory)->example2$Inflammatory
example2 <- example2[!is.na(example2$Inflammatory), ]

p2 <- ggplot(example2, aes(x = ID, y = Inflammatory)) +
  geom_violin() +
  geom_jitter(width = .2, height = .1) +
  labs(title = "Large variance")

p1 + p2

^{Created on 2022-02-08 by the reprex package (v2.0.1)}

2 replies

jjohnston123 Feb 8, 2022
Author

Thank you! This is helpful! To address this and hopefully get estimates that are more similar between the two models, I shifted to fitting a multinomial model where 0 now represents not having a lesion, 1 is non-inflammatory, and 2 is inflammatory. This allows me to distinguish between no lesions and real missing values. I've been successful in fitting a model and getting an emmeans output with the following code:

model.mblogit<-mblogit(formula=Inflammatory ~ Level, data= example2, random= ~1|ID)

emmeans.mblogit<-emmeans(model.mblogit, ~ Inflammatory + Level, type="response")

Inflammatory	Level	prob	SE	df	asymp.LCL	asymp.UCL
0	0	0.30683015	0.06007112	Inf	0.18909291	0.42456738
1	0	0.08202672	0.030528	Inf	0.02219294	0.1418605
2	0	0.61114313	0.06522667	Inf	0.48330121	0.73898506
0	1	0.44042052	0.06280569	Inf	0.31732363	0.56351742
1	1	0.14609472	0.042096	Inf	0.06358807	0.22860136
2	1	0.41348476	0.06386191	Inf	0.28831772	0.5386518
0	2	0.54088448	0.06532594	Inf	0.412848	0.66892096
1	2	0.09915202	0.03413202	Inf	0.03225449	0.16604956
2	2	0.3599635	0.06376851	Inf	0.23497951	0.48494749

It looks like the inflammatory=2 lesion probability is higher in the level 0 group compared to the other levels. However, this could be due to fact that the level 0 group also has a smaller probability of having no lesions (inflammatory=0). Therefore, I'm trying to figure out how to get an estimated marginal means output to exclude the cells with inflammatory=0 so that I can see probabilities and CLs that don't take into account the non-lesional data and then perform the contrasts of interest. In other words, have the probabilities for inflammatory=1 + inflammatory=2 equal 1 rather than inflammatory=0 + inflammatory=1 + inflammatory=2 equal 1? I know I can do this in SPSS after running a multinomial model but wasn't sure how to do it in R.

Thanks again for all your help!

jjohnston123 Feb 8, 2022
Author

I also tried bias adjusting the estimated marginal means of the model with the larger variation across IDs which helped a little bit though it is still off from the other model:

VarCorr(model.example2)

Conditional model:
Groups Name Std.Dev.
ID (Intercept) 1.2106

emmeans(model.example2, ~Level, type="response",bias.adjust=TRUE,sigma=1.2106)

Level	prob	SE	df	lower.CL	upper.CL
0	0.83570779	0.06446246	182	0.6814339	0.92980109
1	0.69730674	0.06738983	182	0.56509546	0.81932857
2	0.73189094	0.07396492	182	0.5821614	0.85792706

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Estimate Discrepancy Between 2 Complementary Models #211

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Estimate Discrepancy Between 2 Complementary Models #211

jjohnston123 Feb 7, 2022

Replies: 2 comments · 2 replies

strengejacke Feb 8, 2022 Maintainer

strengejacke Feb 8, 2022 Maintainer

jjohnston123 Feb 8, 2022 Author

jjohnston123 Feb 8, 2022 Author

jjohnston123
Feb 7, 2022

Replies: 2 comments 2 replies

strengejacke
Feb 8, 2022
Maintainer

strengejacke
Feb 8, 2022
Maintainer

jjohnston123 Feb 8, 2022
Author

jjohnston123 Feb 8, 2022
Author