Parallelize multi-target optimizers #410

scatr · 2023-09-11T11:13:33Z

Hi,

I have recently been using the MIOSR optimizer now included with PySINDy, which requires a Gurobi license. It would be good if a small change could be made to this code allowing multiple models to be fit in parallel. This is mentioned in the Gurobi documentation: https://support.gurobi.com/hc/en-us/articles/360043111231-How-do-I-use-multiprocessing-in-Python-with-Gurobi-

Each model that is opened must also be closed.

Thanks in advance,
Alasdair

Jacob-Stevens-Haas · 2023-09-11T17:26:49Z

This is a great idea, it's just hard. Multiprocessing could speed up any multi-target optimizer that trains each target independently, not just MIOSR. But there exist cases where multiprocessing would slow things down rather than speed them up, and I'm not yet savvy enough to know when they are. E.g., if copying the training data for each worker cause a PC to use swap space instead of RAM, would that be slower than not copying the data and using a single process? There's also some warnings in the docs about garbage collection that I understand at only a basic level.

The above example uses multiprocessing.Pool, which I suppose would replace the loops currently used for training against multiple targets. But scikit-learn also provides sklearn.multioutput.MultiOutputRegressor as a way to parallelize multi-target regressions. The first step would probably be (a) playing with MultiOutputRegressor(n_jobs=...) vs pools in a simple model to evaluate performance gains and (b) seeing whether we can replace loops for multi-targets with the MultiOutputRegressor.

That said, MIOSR determines whether to train against targets independently depending upon the target_sparsity/group_sparsity arguments. It's not always possible to parallelize it.

scatr · 2023-10-04T12:11:19Z

Hi Jacob,

Sorry for the slow reply, and thanks for yours. I agree that I'm not sure multiprocessing would always speed it up, and multiprocessing in python is quite expensive given how much data needs to be copied due to the GIL. There seems to be a comment about this in the code of SINDy-PI. Generally MIOSR seems to fit quickly in most cases I have used it. I also think that if you start a new process each time, Gurobi has to query the license server which probably would slow it all down.

For the parallelisation I was looking to allow it to be parallelised in cross-validation. So the parallelisation is outside of SINDy completely and the workers only need to be spun up once, rather than every single time. However, I'm not sure this is safe to do from the link I sent? I don't know if the models are closing properly in parallel without those additional lines of code. So from what I can tell MIOSR would need the additional line with gp.Env() as env, gp.Model(env=env) as model: to make sure the models are closed properly in parallel. I don't think this will impact the serial performance? I have to admit I'm not sure though.

Cheers

Jacob-Stevens-Haas changed the title ~~Include code allowing for parallel computation with MIOSR~~ Parallelize multi-target optimizers Sep 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize multi-target optimizers #410

Parallelize multi-target optimizers #410

scatr commented Sep 11, 2023

Jacob-Stevens-Haas commented Sep 11, 2023

scatr commented Oct 4, 2023 •

edited

Parallelize multi-target optimizers #410

Parallelize multi-target optimizers #410

Comments

scatr commented Sep 11, 2023

Jacob-Stevens-Haas commented Sep 11, 2023

scatr commented Oct 4, 2023 • edited

scatr commented Oct 4, 2023 •

edited