Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use correction for small-sample bias in all Chisq effect size #588

Open
3 of 6 tasks
mattansb opened this issue Apr 14, 2023 · 0 comments
Open
3 of 6 tasks

Use correction for small-sample bias in all Chisq effect size #588

mattansb opened this issue Apr 14, 2023 · 0 comments
Assignees
Labels
Discussion 🦜 Talking about our ~feelings~ stats enhancement 🔥 New feature or request

Comments

@mattansb
Copy link
Member

For $\phi$, the small-sample bias corrected estimate is:

$$ \widetilde{\phi} = \sqrt{\phi^2 - \frac{df}{N-1}} $$

This comes from the non-central $\chi^2$ distribution, where $E[\hat{\chi^2}] = df + \phi ^2 \times N$ => $E[\hat{\phi^2}] = \phi ^2 + df / N$.

This is used in effectsize for:

  • phi(adjust = TRUE)
  • cramers_v(adjust = TRUE)
  • tschuprows_t(adjust = TRUE)

(The latter two also have a weird scaling factor from Bergsma (2013).)

This correction can be applied to all $\phi$-like effect sizes:

  • cohens_w() - makes the most sense as it applies the same transformation on $\chi^2$ as $\phi$ does.
  • pearsons_c() - can be seen as a transformed Cohen's w ( $C = \sqrt{W^2 / (W^2 - 1)}$ ) so using an adjusted w would "adjust" C as well.
  • fei() - same reasoning. Although the additional scaling factor ( $1/min(p_E) - 1$ ) might have to be adjusted in a similar manner as V and T's is. (See next section.)

Some of my thoughts...

Bergsma (2013) suggested changing the scaling factors of V and T in such a way that when (the true) $T=1$, RMSE would be 0 because (regardless of sample size) the estimated T would also be 1.

To achieve this with פ:

$$ \widetilde{פ} = \sqrt{\frac{\widetilde{\phi^2}}{\frac{1}{min(p_E)} - 1 - \frac{k-1}{n-1}}} $$

I'm not sure this is the way to go, because it also means that a sample in which פ=1 will produce an estimate of 1, even when the sample size is arbitrarily small. For example:

O <- c(2, 0)
E <- c(0.35, 0.65)

res <- chisq.test(O, p = E, correct = FALSE)

chisq <- unname(res$statistic)
df <- unname(res$parameter)
N <- sum(O)

phi2_adj <- chisq / N - df / (N - 1)

# adjusted Fei
sqrt(phi2_adj / 
       (1 / min(E) - 1 - df / (N - 1)))
#> [1] 1

# unadjusted Fei
effectsize::fei(O, p = E, ci = NULL)
#> Fei 
#> ----
#> 1.00
#> 
#> - Adjusted for uniform expected probabilities.

This is also true for T (by design):

mat <- diag(2)
mat[1,1] <- 2
mat
#>      [,1] [,2]
#> [1,]    2    0
#> [2,]    0    1

effectsize::tschuprows_t(mat, ci = NULL)
#> Tschuprow's T (adj.)
#> --------------------
#> 1.00

From what I can see, small sample bias adjustments almost always shrink the estimate, even when it is perfect (e.g., $R^2_{adj}$, $\omega^2$, $\epsilon^2$). So I think having:

$$ \widetilde{פ} = \sqrt{\frac{\widetilde{\phi^2}}{\frac{1}{min(p_E)} - 1}} $$

(which uses the regular scaling factor) makes the most sense to me, which will also make it consistent with w for the uniform-binary case, but will make it inconsistent with the adjusted V and T.

@mattansb mattansb self-assigned this Apr 14, 2023
@mattansb mattansb added enhancement 🔥 New feature or request Discussion 🦜 Talking about our ~feelings~ stats labels Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion 🦜 Talking about our ~feelings~ stats enhancement 🔥 New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant