-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embed parallelization into the multi_voxel_fit decorator. #2593
base: master
Are you sure you want to change the base?
Conversation
Hello @arokem, Thank you for updating ! Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated at 2024-05-16 04:28:10 UTC |
Oh yeah - that’s a great pointer! I’ll try to incorporate the ideas you
implemented there into this PR.
…On Sun, May 8, 2022 at 10:07 AM Serge Koudoro ***@***.***> wrote:
Thank you for starting this @arokem <https://github.com/arokem>?
Have you looked at #1418 <#1418>? I
think some ideas can be reused here.
—
Reply to this email directly, view it on GitHub
<#2593 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA46NTHC7HEWHHJSVVXWKDVI7YGFANCNFSM5VLIP45A>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2593 +/- ##
==========================================
- Coverage 83.75% 83.72% -0.04%
==========================================
Files 153 153
Lines 21343 21364 +21
Branches 3445 3451 +6
==========================================
+ Hits 17876 17887 +11
- Misses 2611 2620 +9
- Partials 856 857 +1
|
I ran a benchmark on a beefy 24-cpu compute server with the recent commit.I get a roughly 13x speedup for fitting the fwdti model with |
Does anyone understand why half the CI actions are still pending? They have been pending since Friday! |
No, but I will restart them first |
Hi @arokem, It seems we have a new issue with DIPY installation. I do not know yet what changes. the CI's are failing in all PR. |
Just rebased on top of #2595 |
Does anyone understand these CI failures? I don't think they are related to the content of the PR, but I might be missing something. |
Both failures are on the parallelization CI's with a memory leaks issue. This might be due to some of the parallel packages that might change some environment variable flags. These flags could have an impact on this parallelized function. All supposition, this is what comes first to my mind. |
the failing function are using |
Hey @skoudoro, I noticed that you did not pin the ray version in #2600, instead pinning only protobuf, but I am seeing this again on the CI: https://github.com/dipy/dipy/runs/6634820045?check_suite_focus=true#step:9:119, which suggests to me that I should pin ray to 0.11 for now. Does that make sense to you? I'll give it a try here. |
Or possibly 1.11.1 |
We're back to this failure: https://github.com/dipy/dipy/runs/6645881563?check_suite_focus=true#step:9:3751 Interestingly, I can't get this to fail locally on my machine (in an env with dask, ray and joblib installed). I also don't exactly understand how this is related to openmp. Does |
single_voxel_with_self = partial(single_voxel_fit, self) | ||
n_jobs = kwargs.get("n_jobs", multiprocessing.cpu_count() - 1) | ||
vox_per_chunk = np.max([data_to_fit.shape[0] // n_jobs, 1]) | ||
chunks = [data_to_fit[ii:ii + vox_per_chunk] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might duplicate memory. Need to benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use this: https://pypi.org/project/memory-profiler/
8d7f71b
to
173160c
Compare
Plan to make progress here:
|
173160c
to
ddac5c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, it works with `engin in ["serial", "dask"]
- my laptop crash with
ray
- See below for joblib issue.
I will share the timing when those 2 are fixed.
Thanks @arokem
_parallel_fit_worker, | ||
chunks, | ||
func_args=[single_voxel_with_self], | ||
**kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dask
did not complain but joblib
fails with this:
TypeError: __init__() got an unexpected keyword argument 'vox_per_chunk'
we need to update paramap function
7e158ff
to
dda2ffa
Compare
The errors on the CI are really puzzling and hard to reproduce locally and we worry that it might be some funky interaction with joblib, so I am removing the multi_voxel decorator for that model and we can reinstate it on a separate PR later on, if we fix everything else here. |
OK - having removed the decorator from the SFM model, it seems that the only remaining error on the CI is completely unrelated. Or am I missing something? |
At any rate, @asagilmore has now completed an extensive set of experiments with this PR and we are glad to say that Ray in particular provides a substantial speedup for reconstruction models (on the order of 10x) across two different models, and across a pretty wide range of hardware setups and chunking schemes. We're writing up a report about this here and we'd be happy to have input on the results and ideas that we are developing there (the repo for that report is here: https://github.com/nrdg/2024-dipy-parallelization). With that said, I think that this code and the results we report are ready for a review, and are hopefully close to a shape where they can be merged for an upcoming release. |
Looks like installing pytables on mac is (newly?) broken: https://github.com/dipy/dipy/actions/runs/8847440312/job/24295269319?pr=2593#step:6:125 |
Yes, I saw that with pytable. Not sure what we can do apart from reporting. The last release was 6month ago so not sure what changed last week. Maybe a release of h5py... |
There was one on April 10th: https://pypi.org/project/h5py/3.11.0/ I'm trying to pin to 3.10 in cd0e653. Let's see what we learn. |
Following #3202, what's the right way to catch a warning thrown when importing one of the optional dependencies? I am getting:
I believe this is emitted when importing ray. Should I explicitly ignore it in a context manager around that import? |
The "right way" would be to add the warning message with see here: Lines 184 to 186 in 60669b8
|
if it is too much for the |
For example, see here: https://github.com/mne-tools/mne-python/blob/main/mne/conftest.py#L131-L178 So, this is something to decide together, what is your opinion @arokem and @jhlegarreta ? |
I like it better in the conftest |
Hats off for this work, Ariel.
Having filtering rules both in the So probably as Serge says, if the list of rules becomes too lengthy, better to keep them all in the |
OK, thank you for your feedback @arokem and @jhlegarreta. I will create a PR later today to update that. Also, I think we should add it to the developer guide concerning the warnings policy. |
Thanks! I will wait for your PR to see how this is done. In the meanwhile, it doesn't look like pinning h5py helped with the pytables installation. |
I've started playing around with the idea that the multi_voxel_fit decorator could use
paramap
instead of iterating over voxels. If we can make this work generally that would be pretty cool. So far, I've only tested this with the fwdti model, and in that case, the change to the additional changes to the code are rather minimal, which gives me hope that we might be able to use this wherever we use this decorator, so in csd, dsi, forecast, fwdti, gqi, ivim, mapmri, mcsd, qtdmri, and shore (!).