Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy in t_to_d #212

Open
rvlenth opened this issue Nov 28, 2020 · 18 comments
Open

Discrepancy in t_to_d #212

rvlenth opened this issue Nov 28, 2020 · 18 comments
Labels
Discussion 🦜 Talking about our ~feelings~ stats

Comments

@rvlenth
Copy link

rvlenth commented Nov 28, 2020

Describe the bug
I was doing a comparison between the results of emmeans::eff_size() and effectsize::t_to_d(), and come up with very different results

To Reproduce

require(effectsize)
#> Loading required package: effectsize
require(emmeans)
#> Loading required package: emmeans

warp.lm = lm(breaks ~ tension, data = warpbreaks)
emm = emmeans(warp.lm, "tension")

### Using emmeans::eff_size
eff_size(emm, sigma = sigma(warp.lm), edf = df.residual(warp.lm))

#>  contrast effect.size    SE df lower.CL upper.CL
#>  L - M          0.842 0.344 51    0.152     1.53
#>  L - H          1.239 0.355 51    0.526     1.95
#>  M - H          0.397 0.336 51   -0.276     1.07
#> 
#> sigma used for effect sizes: 11.88 
#> Confidence level used: 0.95

### Using t_to_d
t = summary(pairs(emm)) $ t.ratio
t_to_d(t, df = df.residual(warp.lm))

#>    d |        95% CI
#> --------------------
#> 0.71 | [ 0.14, 1.27]
#> 1.04 | [ 0.45, 1.62]
#> 0.33 | [-0.22, 0.89]

Note that the estimated effect sizes are somewhat smaller than yielded by eff_size().

My own equivalent function

  • Since d = delta / sigma (where delta is the difference of means), and t = delta / (sigma * sqrt (2 / n)), we have that d = sqrt(2 / n) * t.
  • Meanwhile, df = k * (n - 1) where k is the number of groups, so that n = 1 + df / k.
  • Finally, to get k, the number of t ratios is k * (k - 1) / 2, so k = (1 + sqrt(1 + 8 * L)) / 2, where L is the number of t ratios.

Hence, my own version of t_to_d(), implementing these steps backwards:

### My own equivalent (just the point estimates)
rvl_t2d = function(t, df) {
    k = (1 + sqrt(1 + 8 * length(t))) / 2
    n = 1 + df / k
    sqrt(2 / n) * t
}

rvl_t2d(t, df.residual(warp.lm))

#> [1] 0.8417098 1.2391839 0.3974741

These results agree with the emmeans::eff_size() results.

Additional comment

My rvl_t2d() function depends on our having a balanced experiment with independent samples. Anyn such function that depends only on t ratios and df would require the same assumption (or something equally restrictive). The function emmeans::eff_size() does not require such a simplifying assumption, as it actually estimates each pairwise difference and divides it by the estimate of sigma. That's one reason I am quite sure that its results are correct. But I would think that effectsize::t_to_d() should yield the same results when those restrictive assumptions are satisfied, as they are in this illustration. I am not well-versed in the effect-size literature so I cannot comment with any authority on the formulas given in the documentation for t_to_d().

My system

R.Version()
#> $platform
#> [1] "x86_64-w64-mingw32"
#> 
#> $arch
#> [1] "x86_64"
#> 
#> $os
#> [1] "mingw32"
#> 
#> $system
#> [1] "x86_64, mingw32"
#> 
#> $status
#> [1] ""
#> 
#> $major
#> [1] "4"
#> 
#> $minor
#> [1] "0.3"
#> 
#> $year
#> [1] "2020"
#> 
#> $month
#> [1] "10"
#> 
#> $day
#> [1] "10"
#> 
#> $`svn rev`
#> [1] "79318"
#> 
#> $language
#> [1] "R"
#> 
#> $version.string
#> [1] "R version 4.0.3 (2020-10-10)"
#> 
#> $nickname
#> [1] "Bunny-Wunnies Freak Out"

packageVersion("effectsize")
#> [1] '0.4.0'

packageVersion("emmeans")
#> [1] '1.5.2.10003'

Created on 2020-11-28 by the reprex package (v0.3.0)

@rvlenth
Copy link
Author

rvlenth commented Nov 28, 2020

Looking at the documentation, I worked out that for the case of k = 2 (2 groups), we have

     2 * t        delta          n
    ---------  =  ----- x sqrt(-----)
    sqrt(df)        s          n - 1

which is approximately equal to Cohen's d. So I think the error is in not accounting for the number of groups.

@mattansb
Copy link
Member

@rvlenth Indeed the form used here is d = 2 * t / sqrt(df), based on e.g. Rosenthal (1994) (should perhaps update the reference in the docs), by which:

image

This formula is used for meta-analyses purposes and is also used by some cases / fields as an approximation of Cohen's d in other situations where it is not straight forward to compute it, such as for LMMs (I should make this more clear in the docs / vignette, along with the assumed equal-sized groups).

I will also add a reference to emmeans::eff_size() and your comparisons vignette in the docs / vignette.

Perhaps I can add a k_adjust argument, for users to supply the number of original groups, to use for the adjustment, which if set would implement your rvl_t2d()? I'm surprised I haven't come across these adjustments - perhaps a silly question, but do you know of any reference to this adjustment method?


Rosenthal, R. (1994) Parametric measures of effect size. In H. Cooper and L.V. Hedges (Eds.). The handbook of research synthesis. New York: Russell Sage Foundation.

mattansb added a commit that referenced this issue Nov 29, 2020
@rvlenth
Copy link
Author

rvlenth commented Nov 29, 2020 via email

@mattansb
Copy link
Member

If I get a chance, I'll look at the Rosenthal reference. I'd want to make sure it doesn't say that that equation holds only when there are two groups.

It makes sense to me that you are right, even if this isn't explicitly stated, as Cohen's d is usually framed as a difference between two means.

I think I can re-use you're example code without too much compatibility issues.

require(effectsize)
#> Loading required package: effectsize
require(emmeans)
#> Loading required package: emmeans
`t2d_rvl()`
t2d_rvl <- function(t, df_error, paired = FALSE, ci = 0.95, k = 2) {
  .get_ncp_t <- effectsize:::.get_ncp_t
  
  if (paired) {
    res <- data.frame(d = t / sqrt(df_error + 1))
    
    if (is.numeric(ci)) {
      stopifnot(length(ci) == 1, ci < 1, ci > 0)
      res$CI <- ci
      
      ts <- t(mapply(.get_ncp_t,
                     t, df_error, ci))
      
      res$CI_low <- ts[,1] / sqrt(df + 1)
      res$CI_high <- ts[,2] / sqrt(df + 1)
    }
  } else {
    n = 1 + df_error / k
    
    res <- data.frame(d = sqrt(2 / n) * t)
    
    if (is.numeric(ci)) {
      stopifnot(length(ci) == 1, ci < 1, ci > 0)
      res$CI <- ci
      
      ts <- t(mapply(.get_ncp_t,
                     t, df_error, ci))
      
      res$CI_low <- sqrt(2 / n) * ts[,1]
      res$CI_high <- sqrt(2 / n) * ts[,2]
    }
    
    class(res) <- c("effectsize_table", "see_effectsize_table", class(res))
    return(res)
  }
}
warp.lm = lm(breaks ~ tension, data = warpbreaks)
emm = emmeans(warp.lm, "tension")

eff_size(emm, sigma = sigma(warp.lm), edf = df.residual(warp.lm))
#>  contrast effect.size    SE df lower.CL upper.CL
#>  L - M          0.842 0.344 51    0.152     1.53
#>  L - H          1.239 0.355 51    0.526     1.95
#>  M - H          0.397 0.336 51   -0.276     1.07
#> 
#> sigma used for effect sizes: 11.88 
#> Confidence level used: 0.95

s <- summary(pairs(emm))
t2d_rvl(s$t.ratio, s$df, k = 3)
#>    d |        95% CI
#> --------------------
#> 0.84 | [ 0.16, 1.51]
#> 1.24 | [ 0.54, 1.93]
#> 0.40 | [-0.26, 1.05]

Created on 2020-11-29 by the reprex package (v0.3.0)

I do have a question - what about non-pairwise contrasts? Any idea how one might use these t-values to get d like effect sizes?

Example:

contr <- contrast(emm, list(c = c(2,-1,-1)/2))
 
eff_size(contr, sigma = sigma(warp.lm), edf = df.residual(warp.lm), method = "identity")
#>  contrast effect.size    SE df lower.CL upper.CL
#>  c               1.04 0.307 51    0.425     1.66
#> 
#> sigma used for effect sizes: 11.88 
#> Confidence level used: 0.95 
 
s <- summary(contr)
 
# This?
t2d_rvl(s$t.ratio, s$df, k = 2)
#>    d |       95% CI
#> -------------------
#> 0.99 | [0.41, 1.56]
 
# Or this?
t2d_rvl(s$t.ratio, s$df, k = 3)
#>    d |       95% CI
#> -------------------
#> 1.20 | [0.50, 1.89]

@rvlenth
Copy link
Author

rvlenth commented Nov 30, 2020

It does make sense to consider a contrast other than a pairwise one, with the interpretation that d is that contrast expressed in SD units. However, I don't think either of your answers is correct, because the formulas are based on the SE for a pairwise comparison. The "right answer" is:

> eff_size(contr, sigma = sigma(warp.lm), edf = 51, method = "identity")
 contrast effect.size    SE df lower.CL upper.CL
 c               1.04 0.307 51    0.425     1.66

sigma used for effect sizes: 11.88 
Confidence level used: 0.95 

@mattansb
Copy link
Member

mattansb commented Dec 1, 2020

For the time being, I have updated the documentation and vignette.
Will get back to this at some point in the near future - thanks Russ!

@mattansb
Copy link
Member

@bwiernik might have something to contribute here (thoughts / ideas / solutions)?

@bwiernik
Copy link
Contributor

Can you give me a tl;dr on what the issue is?

@mattansb
Copy link
Member

The t_to_d() conversion (for a conditional d) seems appropriate only for 2 level factors (with equal sample sizes, but that's kind of a given I think).

Then Russell and I had a back and forth discussing if the function should be expanded to a 3+ level factor, and how (to also account for non-pairwise contrasts, etc...).

@bwiernik
Copy link
Contributor

Ah, here are methods for d values for arbitrary contrasts https://psycnet.apa.org/journals/met/13/2/99/ and https://journals.sagepub.com/doi/abs/10.1111/1467-9280.00287

And for contrasts of medians https://psycnet.apa.org/doiLanding?doi=10.1037/1082-989X.7.3.370 and https://onlinelibrary.wiley.com/doi/abs/10.1111/bmsp.12171

Reasonable approaches here would be to (1) compute all pairwise contrast d values, (2) compute d contrasts versus the reference level of a factor, (3) compute d contrasts versus the grand mean, or (4) compute d contrasts for a user-specified contrast or correlation for ordered factors

@bwiernik
Copy link
Contributor

Of those options, I suggest (2) by default for non-ordered factors and (4) correlations for ordered factors. We can add the option for a user specified contrast matrix ala emmeans. We could also rely on the contrasts set for the variable, but most users don't work with that much.

@rvlenth
Copy link
Author

rvlenth commented Jun 21, 2021

If you give somebody an effect size, then you are giving them something that is in SD units. That means it is vitally important to understand exactly what those units are -- especially the person who constructed the effect size and reports it. From that perspective, a function like t_to_d() is something of a disservice -- even if it produces the correct result -- because it bypasses the user's need to even think about those SD units.

If you have two independent samples of equal size and you think that a pooled t is appropriate, then t_to_d() gives you the value of d. But note all the qualifiers: two, independent, equal, pooled, t. If the samples are not independent, or if they are of unequal size, you are already out of luck. If the pooled t is not appropriate, it means you don't have equal population SDs, so now you are not only out of luck, but the whole concept of standardized effect size is ambiguous, if not completely out the window. And t is a qualifier because technically it requires normality, albeit there is some robustness there when the sample sizes are equal.

If you have more than two samples, then you can correct for that, but there are still all the other issues I just mentioned.

Anyone who really knows me knows I am not a strong proponent of standardized effect sizes, and I even have publications advising strongly against using them as target values in sample-size calculations. Accordingly, the function emmeans::eff_size() is pretty rudimentary. I uses estimates of pairwise differences from a fitted model; but it provides no default for the sigma argument or its degrees of freedom. This makes it clumsy to use from the perspective of a user who wants a quick answer. But from the above perspective, I'm pretty proud of that clunkiness because it forces the user to think about what SD units should be used and where it came from.

Looking back at this thread, I wish I had said this earlier, instead of just addressing computational and technical concerns.

@bwiernik
Copy link
Contributor

bwiernik commented Jun 21, 2021

I'm also not a big fan of standardized effect sizes.

That said, if a function exists that provides them, I think the possible choices users have for standardized them and their ramifications should be made clear. From that perspective, I'm not sure the behavior of emmeans::eff_size() is the best. Because it doesn't have a default value for sigma for example, many users will tend to look to the examples for where to get the sigma value. The first example shows using the sigma() function on the fitted model. That's a reasonable value for an lm with only the one group factor, but probably isn't reasonable if there are covariates in the model (cf. the recent discussion here #351). My preference is for providing clear sets of choices to users and then doing the appropriate sets of calculations based on those choices.

@rvlenth
Copy link
Author

rvlenth commented Jun 22, 2021

@bwiernik -- Easy to say, but seldom easy to do. The documentation for eff_size() says (among other things):

For models having a single random effect, such as those fitted using lm; in that case, the stats::sigma
and stats::df.residual functions may be useful for specifying sigma and edf.

This pretty much describes the only situation where I could possibly present a usable choice, based on information that can be extracted from a model object.

It goes on to say:

For models with more than one random effect, sigma may be based on some combination of the random-effect variances.

Do you recommend that I take out the first example where I use the sigma() function because a user may not have read the documentation? What about the subsequent examples, where I do show something different? If I do somehow provide clear choices and an explanation of the ramifications of each choice, why would I think they'd read that, given that they didn't read the documentation?

What eff_size() does do that most other software does not is to take into account the error in estimating the SD when it constructs CIs for effect sizes. People should look at those intervals. Often, they'll find out they really have no idea what the actual effect size is.

@mattansb
Copy link
Member

Originally, I came across the t-to-d conversion in a paper utilizing HLM, where it was used a shortcut to obtain <some type of signal to noise ratio>. I agree that raw effect sizes are better, or at the very least standardizing by the correct / most relevant SD, however given that (1) many people are not proficiently trained in or experienced in such practices (something I do emphasis in my own teachings), (2) readers (and often journals) expect these standardized effect sizes, we're just trying to strike the best balance here..

I think making the little t_to_d() function have a million options is not the easystats way to go (for that there is @rvlenth's S tier emmeans, which we do reference in the docs as the best option, or other meta-analysis packages).

So unless there is an easy "fix" to make the t_to_d() function less wrong or biased or misused (to some reasonable degree, I don't expect to have 100% coverage), I suggest better documentation instead - either in the function docs, or in the appropriate vignette.

@DominiqueMakowski
Copy link
Member

🚨 ALERT 🚨

[This is an automated message]

Someone has captured Mattan and is impersonating him. Fortunately, our defense algorithms managed to uncover the fraud. Probable reason yielded: "never would the real Mattan advocate for better documentation and easy usage over user's mandatory betterment"

@mattansb
Copy link
Member

🤣🤣🤣🤣
The better documentation would lead to the user doing better!

@bwiernik
Copy link
Contributor

I definitely agree that standardized effects rarely make any in mixed effects models, no arguments from me there. My concern is mostly about single-level models.

For single-level models, sigma(model) is only really a reasonable choice if the model includes only a single categorical predictor with no covariates. Otherwise, the scale of the effect size will depend on the covariates included in the model.

Within the domain of single level generalized linear models, the meta-analysis literature has explored these issues pretty extensively and given guidelines that conform to what people tend to expect a "Cohen's d" value means. (e.g., See the formulas on page 229 and 230 here:
https://books.google.be/books?hl=en&lr=&id=LUGd6B9eyc4C&oi=fnd&pg=PA229#v=onepage&q&f=false)

Currently, t_to_d() is only correct for a simple two-group t-test with even groups. If the groups vary in size, there are more groups in the model, or there are covariates, both the d value estimate and its confidence interval will be wrong.

The way to make t_to_d() fit expectations would be to use the approach here: #351

That is, arguments should be added for n1, n2, R^2^ for the covariates, and q (number of covariates) and the appropriate conversion made based on those elements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion 🦜 Talking about our ~feelings~ stats
Projects
None yet
Development

No branches or pull requests

5 participants