Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confidence intervals for standardized mean differences in repeated measures #631

Open
Daiki-Nakamura-git opened this issue Feb 7, 2024 · 2 comments

Comments

@Daiki-Nakamura-git
Copy link

The confidence intervals for the effect size of repeated measures calculated by the rm_d() function do not match the results of other existing functions.
For example, the following two outputs have matching point estimates but not matching confidence intervals.

# generate data for repeated measurements
library(mvtnorm)
sigma <- matrix(c(1, 0.5, 0.5, 1), byrow = TRUE, ncol = 2)  # variance-covariance matrix
mu <- c(1, 2)  # population mean vector
n <- 100       # sample size
set.seed(123)  # fixing random number of seeds
dat <- data.frame(mvtnorm::rmvnorm(n = n, mean = mu, sigma = sigma)) # random numbers following a multivariate normal distributio
colnames(dat) <- c("t1", "t2")  # naming variables
dat$diff <- dat$t2 - dat$t1     # difference score

# effect size : difference score variance d_z
library(effectsize)
effectsize::cohens_d(dat$diff, mu = 0) |> print(digits = 5)
#> Cohen's d |             95% CI
#> ------------------------------
#> 0.95874   | [0.71998, 1.19412]

effectsize::rm_d(Pair(t2, t1) ~ 1, data = dat, method = "z",adjust = F)|> print(digits = 5)
#> d (z)   |             95% CI
#> ----------------------------
#> 0.95874 | [0.72195, 1.19553]

The reason for the discrepancy is that rm_d() calculates confidence intervals by approximating the sample distribution with a normal distribution, while the other existing functions calculate confidence intervals using the non-central distribution method. I consider the non-central distribution method to be more accurate.

In addition, the results of the following functions should be consistent.

# Within-Subject Variance: d_rm
r <- cor(dat$t1, dat$t2) # Pearson's product-moment correlation coefficient
effectsize::cohens_d(dat$diff, mu = 0) * sqrt(2 * (1 - r))  # non-central distribution method
#>   Cohens_d       CI    CI_low  CI_high
#> 1 1.081488 1.071631 0.8121631 1.347004

effectsize::rm_d(Pair(t2, t1) ~ 1, data = dat, method = "rm", adjust = F)|> print(digits = 6) # normal approximation method
#> d (rm)   |               95% CI
#> -------------------------------
#> 1.081488 | [0.803159, 1.359817]
# Control Variance: d_b
effectsize::glass_delta(dat$t2, dat$t1, adjust = F)|> print(digits = 6) # non-central distribution method
#> Glass' delta |               95% CI
#> -----------------------------------
#> 1.134785     | [0.801919, 1.463171]

effectsize::rm_d(Pair(t2, t1) ~ 1, data = dat, method = "b", adjust = F)|> print(digits = 6) # normal approximation method
#> Becker's d |               95% CI
#> ---------------------------------
#> 1.134785   | [0.863465, 1.406106]
# Average Variance: d_av
effectsize::cohens_d(dat$t2, dat$t1, pooled_sd = F) |> print(digits = 6)  # non-central distribution method
#> Cohen's d |               95% CI
#> --------------------------------
#> 1.082732  | [0.784408, 1.617938]

effectsize::rm_d(Pair(t2, t1) ~ 1, data = dat, method = "av", adjust = F)|> print(digits = 6) # normal approximation method
#> d (av)   |               95% CI
#> -------------------------------
#> 1.082732 | [0.861387, 1.304076]
# All Variance: Just d
effectsize::cohens_d(dat$t2, dat$t1, pooled_sd = T)|> print(digits = 6) # non-central distribution method
#> Cohen's d |               95% CI
#> --------------------------------
#> 1.082732  | [0.784570, 1.378496]

effectsize::rm_d(Pair(t2, t1) ~ 1, data = dat, method = "d", adjust = F)|> print(digits = 6) # normal approximation method
#> Cohen's d |               95% CI
#> --------------------------------
#> 1.082732  | [0.835758, 1.329706]

I would like the confidence intervals to be calculated in the same method consistently within the package.

@mattansb
Copy link
Member

mattansb commented Feb 7, 2024

Regarding $d_z$ and $d_{rm}$ - those differences in the CIs are rather small, and are the result of the different methods (ncp vs normal approximation). I don't think we will be able to have all functions across the package consistently use the same CI method - some methods are more well defined for some methods compared to others.


Regarding the supposedly larger deviations between $d_b$ vs. Glass's $\Delta$ / $d_{av}$ vs Unpooled-SD d / "just d" vs d: It would be wrong for the former CIs to match the latter CIs since the repeated measures variants account for the fact that samples are nested within person, and so they should be more narrow (we are more certain about the true value of the effect size).

Bellow I demonstrate this with a bootstrap simulation:

Code
library(effectsize)

# Bootstrap within ----------------------

sigma <- matrix(c(1, 0.5, 0.5, 1), byrow = TRUE, ncol = 2)  # variance-covariance matrix
mu <- c(1, 2)  # population mean vector
n <- 100       # sample size
set.seed(123)  # fixing random number of seeds
dat_wide <- data.frame(mvtnorm::rmvnorm(n = n, mean = mu, sigma = sigma)) # random numbers following a multivariate normal distributio
colnames(dat_wide) <- c("t1", "t2")  # naming variables
dat_wide$diff <- dat_wide$t2 - dat_wide$t1     # difference score

get_rm_d <- function(rsplit, method) {
  d <- as.data.frame(rsplit)
  out <- rm_d(Pair(t1, t2) ~ 1, data = d, ci = NULL, adjust = FALSE, method = method)
  out[[1]]
}

boot_wide <- rsample::bootstraps(dat_wide, times = 200)
boot_wide$Beckers_d <- sapply(boot_wide$splits, get_rm_d, method = "b") 
boot_wide$d_av <- sapply(boot_wide$splits, get_rm_d, method = "av") 
boot_wide$just_d <- sapply(boot_wide$splits, get_rm_d, method = "d") 


# Bootstrap between -------------------

dat_long <- datawizard::data_to_long(dat_wide[-3], cols = c("t1", "t2"), names_to = "time")

get_glass_delta <- function(rsplit) {
  d <- as.data.frame(rsplit)
  out <- glass_delta(value ~ time, data = d, ci = NULL)
  out[[1]]
}

get_d_unpooled <- function(rsplit) {
  d <- as.data.frame(rsplit)
  out <- cohens_d(value ~ time, data = d, pooled_sd = FALSE, ci = NULL)
  out[[1]]
}

get_just_d_between <- function(rsplit) {
  d <- as.data.frame(rsplit)
  out <- cohens_d(value ~ time, data = d, ci = NULL)
  out[[1]]
}


boot_long <- rsample::bootstraps(dat_long, times = 200)
boot_long$Delta <- sapply(boot_long$splits, get_glass_delta) 
boot_long$d_unpooled <- sapply(boot_long$splits, get_d_unpooled) 
boot_long$just_d <- sapply(boot_long$splits, get_just_d_between) 

# Compare -------------------

library(ggplot2)
library(ggdist)
library(patchwork)

p_within <- boot_wide[3:5] |> 
  datawizard::data_to_long(names_to = "d_type") |> 
  datawizard::data_modify(
    d_type = factor(d_type, levels = c("Beckers_d", "d_av", "just_d"))
  ) |> 
  ggplot(aes(value, d_type)) + 
  stat_slab(aes(fill_ramp = after_stat(level), fill = d_type),
            .width = c(0.5, 0.95, 1), point_interval = "mean_qi",
            color = "grey") + 
  stat_spike(at = c(-1.30, -0.77), data = \(data) subset(data, d_type == "Beckers_d")) + 
  stat_spike(at = c(-1.30, -0.86), data = \(data) subset(data, d_type == "d_av")) + 
  stat_spike(at = c(-1.33, -0.84), data = \(data) subset(data, d_type == "just_d")) + 
  scale_thickness_shared() + 
  theme_ggdist()

p_between <- boot_long[3:5] |> 
  datawizard::data_to_long(names_to = "d_type") |> 
  datawizard::data_modify(
    d_type = factor(d_type, levels = c("Delta", "d_unpooled", "just_d"))
  ) |> 
  ggplot(aes(value, d_type)) + 
  stat_slab(aes(fill_ramp = after_stat(level), fill = d_type),
            .width = c(0.5, 0.95, 1), point_interval = "mean_qi",
            color = "grey") + 
  stat_spike(at = c(-1.33, -0.73), data = \(data) subset(data, d_type == "Delta")) + 
  stat_spike(at = c(-1.38, -0.78), data = \(data) subset(data, d_type == "d_unpooled")) + 
  stat_spike(at = c(-1.38, -0.78), data = \(data) subset(data, d_type == "just_d")) + 
  scale_thickness_shared() + 
  theme_ggdist()


wrap_plots(p_within, p_between, 
           ncol = 1, 
           guides = "collect") &
  coord_cartesian(xlim = c(-1.5, -0.6)) &
  scale_fill_hue("Type", labels = c("Baseline", "Average SD", "Just d")) &
  labs(y = NULL)

image

The slabs are the bootstrapped sampling distribution of the effect size (top - bootstrap accounting for nesting; bottom - bootstrap not accounting for nesting). The black spikes mark the CIs as returned by the various {effectsize} functions.

We can see:

  1. The sampling distribution for the rm_d() variant are more narrow than the cohens_d() / glass_delta(),
  2. The 95% CI $_{boot}$ match the CIs estimated by the various {effectsize} functions.

@Daiki-Nakamura-git
Copy link
Author

Thanks for your reply.

My point about d_b, d_av, and "just d" was incorrect, as your simulation results show. I did not take the correlations into account.

I would prefer to give users a choice of which CI method to use for d_z and d_rm, but I can respect your idea of using different methods for different functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants