Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize CI calculation #380

Open
mronkko opened this issue Aug 30, 2023 · 5 comments
Open

Parallelize CI calculation #380

mronkko opened this issue Aug 30, 2023 · 5 comments

Comments

@mronkko
Copy link

mronkko commented Aug 30, 2023

The code for calculating CIs runs very slow. This is noted in the documentation too. There is a potential for a dramatical speedup if the CI calculation was parellelized.

The relevant code is here

                     for(i in 1:rows) {
			for(j in 1:cols) {
				if (free[i, j]) {
					newName <- paste(reference, '[', i, ',', j, ']', sep = '')
					retval <- c(retval, createSimilarInterval(newName, interval))
				}
			}			

Instead of using a nested loop, the function should lapply, which could be parallelized using mclapply.

@mcneale
Copy link
Contributor

mcneale commented Aug 30, 2023

Thank you very much for the suggestion! There are different places where parallel code is already used in OpenMx. Of particular note, raw data analyses essentially divide up the rows into as many chunks as there are processors and calculates the likelihoods separately for each chunk then gathers them at the end. This level of parallel isn't as embarrassingly parallel as the CIs, but it does speed up both the initial fitting and. the subsequent model fits to find the CIs. It is also placed where calculations are likely to be most demanding - much more so than, e.g., fitting to covariance matrices and means. I think parallel CIs would have to lose this feature so as not to parallel things that are already in parallel deeper in the calculations. That might, however, be faster depending on number of CIs and number of processors, and perhaps we could organize the lower level parallel to switch off for higher level parallel application. Thanks again - we are interested in improving both the flexibility of the code and its performance.

@mronkko
Copy link
Author

mronkko commented Aug 30, 2023

That makes a lot of sense. In my use case, we are analyzing covariance matrices and there was no indication that more than one core was used in the computation. Parallelization of CI calculation would be really useful because we could move the calculation to a server with 128 cores.

Depending on how parallelization is implemented in OpenMX, you might not need to change the existing code that much because the parallel computing framework might take care of the potential problems in calling parallelized code from code that is already parallelized.

@RMKirkpatrick
Copy link
Contributor

The relevant code is here

That's frontend R code. The code actually relevant to parallel computing is going to be backend C++ code.

You are correct, though, that parallelizing confidence limits (or confidence intervals, when using the Wu-Neale adjustment) would be a better use of multithreading in most cases, due to the coarser level of granularity.

In my use case, we are analyzing covariance matrices and there was no indication that more than one core was used in the computation.

Two questions... First, were you running OpenMx under Windows, or a CRAN build of OpenMx under macOS? Both of those cases lack multithreading support. Second, which optimizer were you using? SLSQP is supposed to know how to divide its computation of the gradient elements among multiple threads.

@mronkko
Copy link
Author

mronkko commented Aug 30, 2023

A bit of background: I have not really used OpenMX for years myself, but this was a question from a student. He is running OpenMX indirectly throug metaSEM package. When we had a meeting today, he asked why the CI calculation was so slow. We took a look at the code to figure it out.

Now answering the questions:

First, were you running OpenMx under Windows, or a CRAN build of OpenMx under macOS? Both of those cases lack multithreading support.

The student used Windows and I use the CRAN version on mac. The 128 core server runs Linux.

Second, which optimizer were you using? SLSQP is supposed to know how to divide its computation of the gradient elements among multiple threads.

We are using whatever is the default. I believe this to be SLSQP.

@RMKirkpatrick
Copy link
Contributor

A bit of background: I have not really used OpenMX for years myself, but this was a question from a student. He is running OpenMX indirectly throug metaSEM package. When we had a meeting today, he asked why the CI calculation was so slow. We took a look at the code to figure it out.

Note that OpenMx is carrying out (at least) two numerical optimizations for every confidence interval requested. So, suppose you request a confidence interval for one parameter. OpenMx would then, at minimum, do three numerical searches at runtime: one to find the maximum-likelihood estimate, one to find the lower limit of the confidence interval, and one to find the upper limit of the confidence interval.

The student used Windows and I use the CRAN version on mac. The 128 core server runs Linux.

OK, then neither you nor the student had a build of OpenMx compiled for parallel computing. However, the Linux-powered server probably is running a multithreaded OpenMx build.

We are using whatever is the default. I believe this to be SLSQP.

SLSQP is the on-load default, and is able to parallelize its computation of the objective function's gradient elements.

Again, I agree that parallelizing over confidence intervals or confidence limits, rather than over gradient elements (or over subsets of the dataset, in the raw-data case) is a better use of parallel computing. We would like to implement it sometime in the future, but it is not a high priority at present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants