-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove separate default job_kwarg n_jobs
for sorters
#2712
Conversation
@@ -21,11 +21,11 @@ | |||
from .utils import SpikeSortingError, ShellScript | |||
|
|||
|
|||
default_job_kwargs = {"n_jobs": -1} | |||
default_job_kwargs = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just remove this?
in the default_params
, we can directly use the get_global_job_kwargs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can. Once we decide on the global I get do both changes in a new commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zm711 Let's do this:
- We keep
n_jobs=1
by default everywhere - We remove
default_job_kwargs
insorters
- We add a
global_job_kwargs_set
flag in thecore/globals.py
to check if the user called theset_global_job_kwargs
at least once - in the
fix_job_kwargs
, ifglobal_job_kwargs_set
isFalse
andn_jobs
is not in the input, we raise a warning to communicate/encourage to use more jobs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. I'm in the recording suite for the next couple days. If you are trying to push this with a deadline definitely just close this (or take over this one), otherwise I will do that in a couple days/if one of my experiments ends early.
I vote for @samuelgarcia @h-mayorquin @chrishalcrow what do you guys think? |
I guess the one concern with changing to a default of multiprocessing is that we may we see more issues whereas with opt-in not everyone is using those. But if we decide to do a default then I think |
I think that 0.8 cores sounds good as a default. My intuition is that we should leave one physical core free as most people multitask x D. |
@samuelgarcia now that you're back opinions on |
If this is for the sorter 0.8 sounds OK (because it is only to write this binary). But we need to have in mind that on some computing nodes you can have many many cores that would lead to an enormous ram demand. So for the main n_jobs I prefer to keep 1 so it force the user to think about it to improve speed. |
I don't like changing the |
I agree. Maybe a warning at the |
I am slightly against the warning but I can't think on a better way to let them know : / Two suggestions if we go with the warning:
|
@samuelgarcia suggested a global flag |
@zm711 I made the suggested changes. Now the |
Thanks Alessio! |
Number of jobs to use. With -1 the number of jobs is the same as number of cores | ||
* n_jobs: int | float | ||
Number of jobs to use. With -1 the number of jobs is the same as number of cores. | ||
Using a float between 0 and 1 will use that fraction of the total cores. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should also make clear that technically someone can use n_jobs > number of cores which can actually degrade performance by splitting off too many processes? I didn't actually realize this until I looked at #2817.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you open an issue about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Will do after I run to a meeting!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
src/spikeinterface/core/job_tools.py
Outdated
@@ -60,7 +60,7 @@ | |||
|
|||
|
|||
def fix_job_kwargs(runtime_job_kwargs): | |||
from .globals import get_global_job_kwargs | |||
from .globals import get_global_job_kwargs, global_job_kwargs_set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from .globals import get_global_job_kwargs, global_job_kwargs_set | |
from .globals import get_global_job_kwargs, is_set_global_job_kwargs_set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of the function better because it ensure that it is a global.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? I would love to learn the difference here between checking with the function and just importing the boolean. I don't really use global
myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Zach here. It's the same and it's super internal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because it ensure that this import is not masked in between by another variable with the same name.
globals in python are not super garanty unless you put global before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
globals corss moules is evill in python having a function interface is more secure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know. Thanks :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in last commit
src/spikeinterface/core/job_tools.py
Outdated
@@ -99,6 +99,15 @@ def fix_job_kwargs(runtime_job_kwargs): | |||
|
|||
job_kwargs["n_jobs"] = max(n_jobs, 1) | |||
|
|||
if "n_jobs" not in runtime_job_kwargs and job_kwargs["n_jobs"] == 1 and not global_job_kwargs_set: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if "n_jobs" not in runtime_job_kwargs and job_kwargs["n_jobs"] == 1 and not global_job_kwargs_set: | |
if "n_jobs" not in runtime_job_kwargs and job_kwargs["n_jobs"] == 1 and not is_set_global_job_kwargs_set(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
I am OK with the logic. I made a proposal. |
src/spikeinterface/core/globals.py
Outdated
def is_global_job_kwargs_set() -> bool: | ||
""" | ||
Check is the global job kwargs have been manually set. | ||
""" | ||
global global_job_kwargs_set | ||
return global_job_kwargs_set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was already existing a fiew line afte rthis one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed @samuelgarcia. But I added type hints!
if "n_jobs" not in runtime_job_kwargs and job_kwargs["n_jobs"] == 1 and not is_set_global_job_kwargs_set(): | ||
warnings.warn( | ||
"`n_jobs` is not set so parallel processing is disabled! " | ||
"To speed up computations, it is recommended to set n_jobs either " | ||
"globally (with the `spikeinterface.set_global_job_kwargs()` function) or " | ||
"locally (with the `n_jobs` argument). Use `spikeinterface.set_global_job_kwargs?` " | ||
"for more information about job_kwargs." | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
et voilà.
Fixes #2696
Right now we aren't respecting the global
n_jobs
that user sets so the only way to override use all cores is to specifically passn_jobs
to the sorter. This is unintuitive if we allow the user to set this globally except in the case of sorters where it has to be set locally.The current global default appears to be
n_jobs=1
so we discussed whether that should be changed for people who don't know to setn_jobs
so let me know if we want to do that in this PR too.