Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multilevel correlation changes when more correlations are computed #290

Open
mattansb opened this issue May 29, 2023 · 6 comments
Open

Multilevel correlation changes when more correlations are computed #290

mattansb opened this issue May 29, 2023 · 6 comments

Comments

@mattansb
Copy link
Member

library(correlation)
#> Warning: package 'correlation' was built under R version 4.2.3

data("mtcars")

mtcars$gear <- factor(mtcars$gear)

A multilevel correlation:

correlation(mtcars, select = c("mpg", "wt", "gear"), multilevel = TRUE)
#> Parameter1 | Parameter2 |     r |             CI | t(30) |         p
#> --------------------------------------------------------------------
#> mpg        |         wt | -0.79 | [-0.90, -0.62] | -7.16 | < .001***
#> 
#> Observations: 32

The correlation changes when there are more variables included - why?

correlation(mtcars, select = c("mpg", "wt", "gear", "drat"), multilevel = TRUE)
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> Parameter1 | Parameter2 |     r |             CI | t(30) |         p
#> --------------------------------------------------------------------
#> mpg        |         wt | -0.78 | [-0.89, -0.59] | -6.79 | < .001***
#> mpg        |       drat |  0.34 | [-0.01,  0.62] |  1.98 | 0.057    
#> wt         |       drat | -0.40 | [-0.66, -0.06] | -2.37 | 0.049*   
#> 
#> Observations: 32

(A client sent me a more dramatic change, where the correlation changes its sign!)

Created on 2023-05-29 with reprex v2.0.2

@mattansb
Copy link
Member Author

mattansb commented May 29, 2023

Looking at the code, it seems like when multilevel = TRUE each variable is adjusted by ALL other variables, not just the grouping factor 😲!

@DominiqueMakowski
Copy link
Member

We really need to move towards a more explicit approach where people should input the vector of variables they want to adjust for as @bwiernik was mentioning, it's probably just adding an ifelse statement somewhere where multilevel=TRUE queries the list of variables and then passes them to the adjustment.

We can preserve back-compatibility with when TRUE, we do the same as we do now but we throw a message like "adjusting for all variables: x, y, z"

@mattansb
Copy link
Member Author

I don't think we should reserve backwards compatibility - I think it should be treated as a bug / unintended behavior.

@DominiqueMakowski
Copy link
Member

Fair enough, then at least deprecate first by warning "please replace TRUE by the list of variables you wish to account for"

@mattansb
Copy link
Member Author

This should be applied to partial= as well.

@strengejacke
Copy link
Member

Yeah, adjustment varies by number of predictors:

data("mtcars")
mtcars$gear <- factor(mtcars$gear)

datawizard::data_adjust(mtcars[c("mpg", "gear")], multilevel = TRUE) |> head()
#>         mpg gear
#> 1 -3.156152    4
#> 2 -3.156152    4
#> 3 -1.356152    4
#> 4  4.939590    3
#> 5  2.239590    3
#> 6  1.639590    3
datawizard::data_adjust(mtcars[c("mpg", "wt", "gear")], multilevel = TRUE) |> head()
#>          mpg          wt gear
#> 1 -3.0439878 -0.45459927    4
#> 2 -1.7575107 -0.19959927    4
#> 3 -2.7574903 -0.51481964    4
#> 4  1.7587607  0.07354550    3
#> 5  0.1938875 -0.06112396    3
#> 6 -0.3052123 -0.12105050    3
datawizard::data_adjust(mtcars[c("mpg", "wt", "drat", "gear")], multilevel = TRUE) |> head()
#>          mpg          wt        drat gear
#> 1 -2.9517840 -0.41184993 -0.10742447    4
#> 2 -1.7261365 -0.15684993 -0.06914768    4
#> 3 -2.5462245 -0.52011589 -0.21349382    4
#> 4  2.0462615 -0.05887657 -0.20395587    3
#> 5  0.3612184 -0.12350301 -0.08362549    3
#> 6  0.2278300 -0.33214543 -0.46694411    3

Created on 2023-05-29 with reprex v2.0.2

Should we change following code:

    if (isTRUE(partial)) {
      data[[x]] <- datawizard::adjust(data[names(data) != y], multilevel = multilevel, bayesian = partial_bayesian)[[x]]
      data[[y]] <- datawizard::adjust(data[names(data) != x], multilevel = multilevel, bayesian = partial_bayesian)[[y]]
    }

so that it only takes x and factors? Like

adjust_vars <- c(x, names(data[vapply(data, is.factor, TRUE)]))
data[[x]] <- datawizard::adjust(data[adjust_vars], multilevel = multilevel, bayesian = partial_bayesian)[[x]]
adjust_vars <- c(y, names(data[vapply(data, is.factor, TRUE)]))
data[[y]] <- datawizard::adjust(data[adjust_vars], multilevel = multilevel, bayesian = partial_bayesian)[[y]]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants