Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize multi-target optimizers #410

Open
scatr opened this issue Sep 11, 2023 · 2 comments
Open

Parallelize multi-target optimizers #410

scatr opened this issue Sep 11, 2023 · 2 comments

Comments

@scatr
Copy link

scatr commented Sep 11, 2023

Hi,

I have recently been using the MIOSR optimizer now included with PySINDy, which requires a Gurobi license. It would be good if a small change could be made to this code allowing multiple models to be fit in parallel. This is mentioned in the Gurobi documentation: https://support.gurobi.com/hc/en-us/articles/360043111231-How-do-I-use-multiprocessing-in-Python-with-Gurobi-

Each model that is opened must also be closed.

Thanks in advance,
Alasdair

@Jacob-Stevens-Haas Jacob-Stevens-Haas changed the title Include code allowing for parallel computation with MIOSR Parallelize multi-target optimizers Sep 11, 2023
@Jacob-Stevens-Haas
Copy link
Collaborator

This is a great idea, it's just hard. Multiprocessing could speed up any multi-target optimizer that trains each target independently, not just MIOSR. But there exist cases where multiprocessing would slow things down rather than speed them up, and I'm not yet savvy enough to know when they are. E.g., if copying the training data for each worker cause a PC to use swap space instead of RAM, would that be slower than not copying the data and using a single process? There's also some warnings in the docs about garbage collection that I understand at only a basic level.

The above example uses multiprocessing.Pool, which I suppose would replace the loops currently used for training against multiple targets. But scikit-learn also provides sklearn.multioutput.MultiOutputRegressor as a way to parallelize multi-target regressions. The first step would probably be (a) playing with MultiOutputRegressor(n_jobs=...) vs pools in a simple model to evaluate performance gains and (b) seeing whether we can replace loops for multi-targets with the MultiOutputRegressor.

That said, MIOSR determines whether to train against targets independently depending upon the target_sparsity/group_sparsity arguments. It's not always possible to parallelize it.

@scatr
Copy link
Author

scatr commented Oct 4, 2023

Hi Jacob,

Sorry for the slow reply, and thanks for yours. I agree that I'm not sure multiprocessing would always speed it up, and multiprocessing in python is quite expensive given how much data needs to be copied due to the GIL. There seems to be a comment about this in the code of SINDy-PI. Generally MIOSR seems to fit quickly in most cases I have used it. I also think that if you start a new process each time, Gurobi has to query the license server which probably would slow it all down.

For the parallelisation I was looking to allow it to be parallelised in cross-validation. So the parallelisation is outside of SINDy completely and the workers only need to be spun up once, rather than every single time. However, I'm not sure this is safe to do from the link I sent? I don't know if the models are closing properly in parallel without those additional lines of code. So from what I can tell MIOSR would need the additional line with gp.Env() as env, gp.Model(env=env) as model: to make sure the models are closed properly in parallel. I don't think this will impact the serial performance? I have to admit I'm not sure though.

Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants