Resolve performance regression in WLS for large sample sizes #376

mhunter1 · 2023-08-11T15:01:50Z

Brad Verhulst observed a large-sample WLS model used to take 1sec and now takes 20sec.

mhunter1 · 2023-08-11T16:45:58Z

git bisect suggested these commits as the problem

59bc1905a1c6fd40f1f878e1efe6d68cc75913ac # Almost certainly not it.
b588e2403ef3799462b0c256c4c2f072d5b9cfcb # Tried.  NOT IT.
618f6c4e89fdcc9a103a193b3c84c48bdd3db449 # NOT IT.
6e6f21cff324a1b7b7931bbe8ef5baa3ebdba11f # Tried.  NOT IT.
fc4f518e9ab2373bd400df4055edd46180f85255 # Probably it.
fb6c8bedb875898d6550f979a7745196bd7a0b59 # Prototype version of fc2f518

mhunter1 · 2023-08-11T16:49:14Z

Here's the minimal working example (MWE) that shows the problem. This is excerpted from the file SlowWLS.R on my machine.

#------------------------------------------------------------------------------
# Author: Michael D. Hunter
# Date: 2023-06-01
# Filename: slowWLS.R
# Purpose: Create a minimal working example that shows that WLS
#  has slowed down a lot since December 10, 2020
#------------------------------------------------------------------------------


#------------------------------------------------------------------------------
# Set working directory, load packages, load data
setwd('~/../Downloads/')

require(OpenMx)

# devtools::install_github('https://github.com/jpritikin/gwsem', ref='aa26dd0')
# Ran tests on aa26dd0 version of gwsem
# devtools::install_github('https://github.com/jpritikin/gwsem')

# Run in msys2
# pacman -Sy mingw-w64-x86_64-zstd
# pacman -Sy mingw-w64-i686-zstd
# pacman -Sy mingw-w64-x86_64-sqlite3
# pacman -Sy mingw-w64-i686-sqlite3
  

require(gwsem)


#load('slowWLS.RData')
#DATA <- mood2$data$observed
#save(DATA, 'slowWLSData.RData')
load('slowWLSData.RData')


#------------------------------------------------------------------------------
# Specify model

mood <- buildItem(phenoData = DATA, depVar  = "mood", 
                  covariates = c("PC1", "PC2", "PC3", "PC4", "PC5", "PC6","PC7", "PC8", "PC9","PC10", "Age", "Sex"),
                  fitfun = "WLS")

r <- mxRun(mood)
summary(r)$wallTime
# 21.94 sec is slow

coef(r)

# Try to tweak precision of computing summary stats
mood2 <- mxModel(mood, mxData(DATA, type='raw', fitTolerance=1e-10, gradientTolerance=1e-2))
# default fit tol = sqrt(as.numeric(mxOption(key="Optimality tolerance")))
# = 2.50998e-06
r2 <- mxRun(mood2)
summary(r2)$wallTime
coef(r2)

# Time to just compute the summary statistics
a <- Sys.time()
augd <- omxAugmentDataWithWLSSummary(mood$data, exogenous=setdiff(names(DATA), 'mood'))
b <- Sys.time()
b-a
# 20+ seconds

mhunter1 · 2023-08-11T19:25:39Z

I ran a simulation study that varied the sample size and the number of covariates, separately. Both seem to have an inordinately large impact compared to the older version of OpenMx. See plot.

plotSlowWlsCovariateSampleSizeOld.pdf
plotSlowWlsCovariateSampleSizeNew.pdf

mhunter1 · 2023-08-11T19:26:40Z

The next step is to investigate the internal Newton-Raphson optimization routine. See if it's taking tiny, too small steps because the gradient needs to be scaled differently?

mhunter1 added enhancement regression test fails inconsistently labels Aug 11, 2023

mhunter1 self-assigned this Aug 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve performance regression in WLS for large sample sizes #376

Resolve performance regression in WLS for large sample sizes #376

mhunter1 commented Aug 11, 2023

mhunter1 commented Aug 11, 2023

mhunter1 commented Aug 11, 2023

mhunter1 commented Aug 11, 2023

mhunter1 commented Aug 11, 2023

Resolve performance regression in WLS for large sample sizes #376

Resolve performance regression in WLS for large sample sizes #376

Comments

mhunter1 commented Aug 11, 2023

mhunter1 commented Aug 11, 2023

mhunter1 commented Aug 11, 2023

mhunter1 commented Aug 11, 2023

mhunter1 commented Aug 11, 2023