Fix N vs N-1 issue for covariance data #321

mhunter1 · 2021-07-16T17:29:28Z

See Mike Cheung's post noting the incorrect Chi-squared value for covariance data: https://openmx.ssri.psu.edu/node/4745

It's an N vs N-1 issue related to the sample covariance using N-1 instead of N. See the output of a partial script below.

We may need changes in the src/omxMLFitFunction.cpp file related to calculating saturated_out and the three calls to stan::prob::multi_normal_sufficient_log.

Output

> M2LL <- 192.0880
> Sat <- 192.0981
>
> Sigma <- mxGetExpected(fit2, 'covariance')
> SigmaInv <- solve(Sigma)
> m <- apply(df, 2, mean)
> mu <- mxGetExpected(fit2, 'means')
>
> # Replicate current OpenMx calculations
> p1 <- log(det(Sigma))*100
> p2 <- log(det(cov(df)))*100
> p3 <- sum(diag(cov(df)%*%SigmaInv))*99
> p4 <- 2*99
>
> p1 + p3
[1] 192.088
> M2LL
[1] 192.088
>
> p2 + p4
[1] 192.0981
> Sat
[1] 192.0981
>
> M2LL - Sat # should be zero, and isn't
[1] -0.0101
>
>
> # Correct OpenMx calculations for N vs N-1 sample cov
> p3a <- sum(diag(99/100*cov(df)%*%SigmaInv))*99
> # Computationally better as
> #  sum(diag(cov(df)%*%SigmaInv))*99*99/100
> p2a <- log(det(99/100*cov(df)))*100
> # Computationally better as
> #  (log(det(cov(df))) + 2*log(99/100))*100  #2=nrow(cov(df))
> # Need means too
> p5 <- sum(diag(SigmaInv %*% t(m - mu) %*% (m - mu)))*100
> # p5 is zero for saturated model
> n_M2LL <- p1 + p3a + p5
> n_Sat <- p2a + p4
>
> n_M2LL
[1] 190.088
> M2LL
[1] 192.088
> # Off by 2 = numVariables
>
> n_Sat
[1] 190.088
> Sat
[1] 192.0981
> # Off by 2*log(99/100)*100
>
>
> n_M2LL - n_Sat # Should be zero, and is
[1] 6.042092e-08
>

Raw Code

M2LL <- 192.0880
Sat <- 192.0981

Sigma <- mxGetExpected(fit2, 'covariance')
SigmaInv <- solve(Sigma)
m <- apply(df, 2, mean)
mu <- mxGetExpected(fit2, 'means')

# Replicate current OpenMx calculations
p1 <- log(det(Sigma))*100
p2 <- log(det(cov(df)))*100
p3 <- sum(diag(cov(df)%*%SigmaInv))*99
p4 <- 2*99

p1 + p3
M2LL

p2 + p4
Sat

M2LL - Sat # should be zero, and isn't


# Correct OpenMx calculations for N vs N-1 sample cov
p3a <- sum(diag(99/100*cov(df)%*%SigmaInv))*99
# Computationally better as
#  sum(diag(cov(df)%*%SigmaInv))*99*99/100
p2a <- log(det(99/100*cov(df)))*100
# Computationally better as
#  (log(det(cov(df))) + 2*log(99/100))*100  #2=nrow(cov(df))
# Need means too
p5 <- sum(diag(SigmaInv %*% t(m - mu) %*% (m - mu)))*100
# p5 is zero for saturated model
n_M2LL <- p1 + p3a + p5
n_Sat <- p2a + p4

n_M2LL
M2LL
# Off by 2 = numVariables

n_Sat
Sat
# Off by 2*log(99/100)*100


n_M2LL - n_Sat # Should be zero, and is

The text was updated successfully, but these errors were encountered:

tbates · 2021-11-06T09:26:04Z

Did the relevant change to src/omxMLFitFunction.cpp happen? Seems important and achievable fix.

RMKirkpatrick · 2023-11-03T20:24:53Z

I'm looking into fixing this bug.

mhunter1 added the bug label Jul 16, 2021

mhunter1 self-assigned this Jul 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix N vs N-1 issue for covariance data #321

Fix N vs N-1 issue for covariance data #321

mhunter1 commented Jul 16, 2021 •

edited

tbates commented Nov 6, 2021

RMKirkpatrick commented Nov 3, 2023

Fix N vs N-1 issue for covariance data #321

Fix N vs N-1 issue for covariance data #321

Comments

mhunter1 commented Jul 16, 2021 • edited

tbates commented Nov 6, 2021

RMKirkpatrick commented Nov 3, 2023

mhunter1 commented Jul 16, 2021 •

edited