Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve documentation around variable reordering or fix bug for user-specified weight matrix #260

Open
mhunter1 opened this issue Jan 30, 2023 · 5 comments

Comments

@mhunter1
Copy link

When you write a lavaan model with a user-specified covariance matrix and a user-specific weight matrix, lavaan rearranges the order of the variables in its covariance matrix based on the order of the variables in the syntax, but does not rearrange the order of the variables in the weight matrix.

So, a user could write the same model on the same data, but get different and incorrect results because the models differed in the order of the variables in the syntax. Below is an example based on the GenomicSEM package.

I propose that lavaan either (1) document this behavior is big, bold writing somewhere related to user-specified weight matrices, or (2) re-arrange the user-specified weight matrix in the same order as the rearranged covariance matrix.

#------------------------------------------------------------------------------
# Author: Michael D. Hunter
# Date: 2023-01-26
# Filename: ReorderGenomicSEM_lavaan.R
# Purpose: Demonstrate that lavaan rearranges the variables in the covariance
#  matrix depending on the model specification, but does not reorder the
#  user-specified weight matrix.
#  This *might* be a problem for people using GenomicSEM.
#------------------------------------------------------------------------------


#------------------------------------------------------------------------------
# Set working directory, load packages, load data created by GenomicSEM

setwd('~/../Downloads/') # Or where-ever you have the 'all_DSM_v2.RData' file

library(lavaan)
# devtools::install_github("GenomicSEM/GenomicSEM")
library(GenomicSEM)

# Data and weight matrix created by GenomicSEM
load("./all_DSM_v4.RData")


#------------------------------------------------------------------------------
# Label covariance matrix and weight matrix
# Also, rearrange these as needed by OpenMx vs lavaan

S <- LDSCoutput$S
S <- as.matrix(Matrix::nearPD(S)$mat)
colnames(S) <- c("MDD","BP","SZ","ADHD","PTSD","ASD","TS","ANX","ED","OCD","AD","CAN")
rownames(S) <- colnames(S)

V <- LDSCoutput$V
rownames(V) <- c(
	"S1,1","S2,1","S3,1","S4,1","S5,1","S6,1","S7,1","S8,1","S9,1","S10,1","S11,1","S12,1",
	"S2,2","S3,2","S4,2","S5,2","S6,2","S7,2","S8,2","S9,2","S10,2","S11,2","S12,2",
	"S3,3","S4,3","S5,3","S6,3","S7,3","S8,3","S9,3","S10,3","S11,3","S12,3",
	"S4,4","S5,4","S6,4","S7,4","S8,4","S9,4","S10,4","S11,4","S12,4",
	"S5,5","S6,5","S7,5","S8,5","S9,5","S10,5","S11,5","S12,5",
	"S6,6","S7,6","S8,6","S9,6","S10,6","S11,6","S12,6",
	"S7,7","S8,7","S9,7","S10,7","S11,7","S12,7",
	"S8,8","S9,8","S10,8","S11,8","S12,8",
	"S9,9","S10,9","S11,9","S12,9",
	"S10,10","S11,10","S12,10",
	"S11,11","S12,11",
	"S12,12"
)
colnames(V) <- rownames(V)

vn <- rownames(V)
W <- diag(1/diag(V))

# Add these names directly to LDSC object
LDSCNamedOutput <- LDSCoutput
dimnames(LDSCNamedOutput$S) <- list(colnames(S), colnames(S))


#------------------------------------------------------------------------------
# Specify lavaan Model

lresid <- 'INT ~~ EXT
INT ~~ P
EXT ~~ P
MDD ~~ MDD
BP ~~ BP
SZ ~~ SZ
ADHD ~~ ADHD
PTSD ~~ PTSD
ASD ~~ ASD
TS ~~ TS
ANX ~~ ANX
ED ~~ ED
OCD ~~ OCD
AD ~~ AD
CAN ~~ CAN'

# Naive syntax
sntx <- paste('
INT =~ MDD + TS + PTSD + ANX + ED + OCD
P =~ BP + SZ + ASD + TS
EXT =~ ADHD + TS + AD + CAN', lresid, sep='\n')

# Syntax that tricks lavaan into keeping the variable order correct
sntx2 <- paste('
INT =~ MDD
P =~ BP
P =~ SZ
EXT =~ ADHD
INT =~ PTSD
P =~ ASD
INT =~ TS
INT =~ ANX
INT =~ ED
INT =~ OCD
EXT =~ AD
EXT =~ CAN
EXT =~ TS
P =~ TS', lresid, sep='\n')

lavmod <- lavaan(
	model=sntx,
	sample.cov=S,
	sample.nobs=2, # What GenomicSEM passes internally to lavaan.
	NACOV=V,
	WLS.V=W,
	std.lv=TRUE,
	check.gradient=F,
	estimator="WLSMV"
)


lavmod2 <- lavaan(
	model=sntx2,
	sample.cov=S,
	sample.nobs=2, # What GenomicSEM passes internally to lavaan.
	NACOV=V,
	WLS.V=W,
	std.lv=TRUE,
	check.gradient=F,
	estimator="WLSMV"
)


#------------------------------------------------------------------------------
# GenomicSEM with either syntax
gem2 <- usermodel(LDSCNamedOutput, estimation = "DWLS", model = sntx2, CFIcalc = TRUE, std.lv = TRUE, imp_cov = FALSE)
gem <- usermodel(LDSCNamedOutput, estimation = "DWLS", model = sntx, CFIcalc = TRUE, std.lv = TRUE, imp_cov = FALSE)


#------------------------------------------------------------------------------
gem$modelfit
gem2$modelfit
# Same model fit regardless of syntax

lavmod@optim$fx
lavmod2@optim$fx
# Different model fit depending on syntax

# lavaan free parameter estimates are different too
# but lavmod2 is the same as GenomicSEM
gemNames <- paste0(gem$results$lhs, gem$results$op, gem$results$rhs)
gemEst <- gem$results$Unstand_Est
names(gemEst) <- gemNames
gemEst <- gemEst[1:length(coef(lavmod))]
cbind(lavmodEst=coef(lavmod),
    lavmod2Est=coef(lavmod2)[names(coef(lavmod))],
    gemEst=gemEst[names(coef(lavmod))])



#------------------------------------------------------------------------------
# The Cause

# Sample statistics are in different order depending on syntax
round(lavmod@SampleStats@cov[[1]], 3)
round(lavmod2@SampleStats@cov[[1]], 3) # this order matches S
round(S, 3)

# The sample statistics are rearranged depending on the syntax ordering of variables
#  but not the weight matrix
# This behavior could lead users to specify incorrect models and
#  interpret incorrect results

Here's the RData file wrapped in a Zip: all_DSM_v4.zip

@TDJorgensen
Copy link
Contributor

Thanks for noticing this @mhunter1

@yrosseel
This will affect the lavaan.mi and lavaan.srm packages too, in cases when passing a lavMoments object to lavaan(). Currently, users can manually specify the order of groups using the group.label= argument, overwriting the default behavior based on the order observed in data=. Would it be feasible to add an analogous var.names= argument for users to specify the order of lavNames(), overwriting the default behavior based on their appearance in model syntax?

FYI @mhunter1, another potential issue might arise if users fit a model to a reduced set of variables, in which case lavaan() easily extracts the relevant rows/columns of sample.cov= to which the model is fitted (likewise with sample.mean= elements). However, the corresponding sampling (co)variances of those summary statistics would not be automatically removed from user-specified WLS.V= or NACOV= arguments, causing an error because the number of dimensions would not be consistent with what lavaan expects. The onus is (currently) on the user to provide the correct weight matrix for any particular analysis.

From a design standpoint, I think @yrosseel can only set consistent expectations for summary statistics (sample.cov= and sample.mean=) and their uncertainty (WLS.V= and NACOV=). That is, if the user provides weights, they should only provide a sample.cov= with corresponding order and dimensions. This is the current case, and I hope it will be straight-forward for lavaan to allow users to keep lavNames() synchronized with the order of summary stats and their uncertainty.

In order to program an automatic reordering of weights, lavaan would still need to expect the dimensions/order of summary stats and their uncertainty to be synchronized, so that the appropriate reordering of WLS.V= or NACOV= could always be inferred from the reordering of sample.cov= (and sample.mean=). But I expect it would be unnecessarily complicated to program this, relative to the simplicity of preserving a user-specified order of lavNames().

@yrosseel
Copy link
Owner

When the WLS.V/NACOV arguments were added, the implicit assumption was that they were extracted from a fitted lavaan object, and that they would be used again for the same model using the same syntax(!), just avoiding recomputing NACOV again.

But now that these arguments are exploited for other means, clearly something needs to happen. There are two possibilities: either we enforce the data-driven order for the variable names (but this would exclude using a subset of the variables, and it is very un-lavaan), or we select+rearrange the elements in the NACOV/WLS.V matrices. The latter is a nightmare, as the NACOV/WLS.V matrices usually have no column or rownames. I will give this some thought.

@yrosseel
Copy link
Owner

yrosseel commented Feb 7, 2023

I have made two changes in the github version:

  1. an new ov.order= argument can be set to "model" or "data"; if it is "model" (the default), the internal ordering of the observed variable names is based on the model syntax (as usual); however, if set to "data", the ordering is determined by the data (or sample covariance matrix)

It turned out that the only way to do this (in a reliable way) was to add additional rows to the parameter table (using a 'da' operator; this can be seen in the @partable slot) reflecting the order. As many internal functions in lavaan are based on the model only (eg lavNames), this was the only way. Other than reordering the observed variables, this ov.order= argument is not supposed to have any other side-effects. (But more testing is needed to verify this).

  1. when the NACOV and/or WLS.V arguments are not-null, ov.order is now set to "data"; this assumes that the ordering in (for example) sample.cov is consistent with the ordering in NACOV/WLS.V, and we do not change this ordering internally (when ov.order="data"). This is now also mentioned in the documentation.

@mhunter1
Copy link
Author

mhunter1 commented Feb 9, 2023

Sounds like a reasonable solution to me!

@TDJorgensen
Copy link
Contributor

thanks @yrosseel, that should work great

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants