Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify multilevel argument #253

Open
bwiernik opened this issue Aug 10, 2022 · 5 comments
Open

Clarify multilevel argument #253

bwiernik opened this issue Aug 10, 2022 · 5 comments
Labels
docs 📚 Something to be adressed in docs and/or vignettes enhancement 💥 Implemented features can be improved or revised

Comments

@bwiernik
Copy link
Contributor

I find the multilevel argument name confusing, and there have been several issues from users lately that have expressed similar confusion.

Based on the name, I would expect a decomposition of the correlation matrix into between-groups and within-groups components, similar to psych::statsBy(). The between-groups component is correlations among group means, the within-groups component is the pooled within-group correlation matrix (computed as the correlations among group-mean-centered variables). This is what is typically meant in my experience (at least in psychology circles) by phrases like "multilevel factor analysis", "multilevel SEM", or "multilevel correlations".

The multilevel argument computes what is effectively the within-groups component described above, but estimated using random effects (random intercepts for group) rather than fixed effects (group-mean-centering or including groups as dummy-coded variables). Both fixed and random specifications of this adjustment are "multilevel" in the sense that they are estimating average within-group correlations, but we currently do not report the between component of the multilevel correlations in either specification.

I think it would be clearer for the argument to be named something like random_factors. This would make it clearer to me that what this argument is switching is how factors are partialed out.

Estimating correct point estimates/df/p/CIs for both within-group and between-group correlations is easy for fixed factor controls (known analytic solutions).

For random factor controls, we can get reasonable point estimates/df/p/CIs for within-correlation using our current estimation approach and some choice of profile likelihood or DoF approximation, or we can be close enough I'd argue by just using the fixed effects df. For between-correlations, we can either (1) pivot to a long format and fit a model with 0 + name + (0 + name | id) and get the correlation from there, then use profile likelihood for the CI, or (2) use our current estimation approach, estimate random effects for persons, and then compute the correlations among those post-hoc, using the fixed effects df. The second option there is probably close enough.

@IndrajeetPatil IndrajeetPatil added docs 📚 Something to be adressed in docs and/or vignettes enhancement 💥 Implemented features can be improved or revised labels Aug 10, 2022
@DominiqueMakowski
Copy link
Member

DominiqueMakowski commented Aug 11, 2022

I think it would be clearer for the argument to be named something like random_factors. This would make it clearer to me that what this argument is switching is how factors are partialed out.

yeah that sounds good to me. Though we should do a soft deprecation first with a warning and leave it for some time (as this is probably quite a popular feature of the package).

Interestingly, I had the same confusion about multilevel factor/SEM analysis. For me, and in my field, "multilevel" is used as a synonym for mixed models (random factors models). And some day I wanted to have RE in my SEM and FA, so I looked for it and was thrilled when I saw multilevel FAs... followed by a disappointment when I understood it was "just" a stratified analysis. So I can understand how users coming from the opposite direction would have the same confusion...

So yeah, making things more explicit is good. We should probably think on overhauling the whole factor treatment, we could have multiple arguments like factors_ignored=NULL (by default, gets filled with all the factors of the df), factors_fixed (previous include_factors), factors_random (for factors to be random), and then some argument to also include random slopes (easystats/datawizard#203)

@bwiernik
Copy link
Contributor Author

I'm thinking of shifting to an explicit declaration of which variables should be partialed or semipartialed, which would make a lot of the arguments easier to manage together

@Pascal-Kueng
Copy link

Is this still relevant?
As you mentioned, I also believe the current implementation may be a bit confusing for people working with multilevel data who expect a decomposition into within- and between-group variance, similar to what the psych::statsBy package achieves.
To address this issue, I have been working on some code that achieves this (as I don't like the output of the psych::statsBy package that much and didn't understand how it achieved it's estimates) and was wondering if this would be helpful.

My script basically centres variables within- and between clusters (similar to bmlm::isolate) and calculates both the within- and between correlations. Confidence intervals and p-values can be calculated and adjusted by using fisher's z-transformation.
With my implementation I achieve pretty much identical results to the psych::statsBy implementation and I do find the within- and between correlation estimates as specified in a simulation I wrote.
Do you think this would fit in somehow with the "multilevel" argument? Or perhaps this could be an additional feature?

@DominiqueMakowski
Copy link
Member

I could easily see this as a feature. I think it would be nice as either a different correlation method or a separate function, because if I understand it has a different output: it returns two correlation indices (between and within) is that correct?

I'm thinking of shifting to an explicit declaration of which variables should be partialed or semipartialed,

I agree with that, moving forward we probably would need to rethink how to make it API more explicit and flexible and less confusing

@Pascal-Kueng
Copy link

Pascal-Kueng commented Apr 21, 2023

Yes that's exactly right. For example, it could return one correlation matrix (and table with other statistics) for the within- correlations and a second separate correlation matrix for the between- correlations. I also think a nice implementation of the summary() method would be a correlation table with the within- correlations above the diagonal and the between- correlations below the diagonal. I think I could provide a class and some functions that would achieve all this.
As the structure is different than for all other correlations, perhaps a standalone class could be better?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs 📚 Something to be adressed in docs and/or vignettes enhancement 💥 Implemented features can be improved or revised
Projects
None yet
Development

No branches or pull requests

4 participants