Estimate Discrepancy Between 2 Complementary Models #211
Replies: 2 comments 2 replies
-
I guess the main reason is the larger variation of your outcome across IDs in the second model. Take a look at the ICCs of both models (see https://easystats.github.io/performance/reference/icc.html), and you will see that the first model almost suffers from singularity (see https://easystats.github.io/performance/reference/check_singularity.html - that's why you almost get identical results when removing the random effects), while the second does not. Thus, shrinkage (or "regularization") is more extreme in the second model, thereby a stronger adjustment of estimates. To answer your last question: you're actually doing nothing wrong. The variation of the outcome across IDs in your second model is just larger. |
Beta Was this translation helpful? Give feedback.
-
Looking at the raw data might help understanding the issue. The shapes in the left figure look more similar than those in the right. library(glmmTMB)
library(patchwork)
library(ggplot2)
example1<-read.csv("~/../Downloads/Example.1.csv",header=T)
colnames(example1)[1] <- "ID"
as.numeric(example1$ID)->example1$ID
as.factor(example1$LesionPresence)->example1$LesionPresence
example1 <- example1[!is.na(example1$LesionPresence), ]
p1 <- ggplot(example1, aes(x = ID, y = LesionPresence)) +
geom_violin() +
geom_jitter(width = .2, height = .1) +
labs(title = "Little variance")
example2<-read.csv("~/../Downloads/Example.2.csv",header=T)
colnames(example2)[1] <- "ID"
as.numeric(example2$ID)->example2$ID
as.factor(example2$Inflammatory)->example2$Inflammatory
example2 <- example2[!is.na(example2$Inflammatory), ]
p2 <- ggplot(example2, aes(x = ID, y = Inflammatory)) +
geom_violin() +
geom_jitter(width = .2, height = .1) +
labs(title = "Large variance")
p1 + p2 Created on 2022-02-08 by the reprex package (v2.0.1) |
Beta Was this translation helpful? Give feedback.
-
I am running into a bit of an issue and am unsure whether it is due to a lack of background knowledge. I have data on the presence of lesions of 2 types (inflammatory or non-inflammatory) in 3 different locations. I have run 2 different logistic mixed models:
Example 1: Lesion presence is the response variable with predictors of location and lesion type with a random effect of subject ID.
Example 1.csv
Example 2: Treating inflammatory vs non-inflammatory as a mutually exclusive binary categories with location as a predictor and a random effect of subject ID.
Example 2.csv
The csv data inputs (attached) for example 1 and example 2 have the same arithmetic means for inflammatory lesions in the 3 different locations. However, the emmeans outputs for example 1 but not example 2 equals these arithmetic means. I was expecting both models to produce the same estimates. I've also noticed that if I remove the random effect for example 1, nothing changes. However, if I remove the random effect for example 2, then the estimated means for inflammatory lesions across both examples are equal.
Any insight into where I may be going wrong?
Beta Was this translation helpful? Give feedback.
All reactions