Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

export_to_phy() not respecting n_jobs #2817

Open
cheydrick opened this issue May 7, 2024 · 7 comments
Open

export_to_phy() not respecting n_jobs #2817

cheydrick opened this issue May 7, 2024 · 7 comments
Assignees
Labels
concurrency Related to parallel processing exporters Related to exporters module

Comments

@cheydrick
Copy link

The "Fitting PCA" step in the export to Phy process isn't respecting the n_jobs argument anymore.

I have set the global n_jobs parameter to 4 with:
si.set_global_job_kwargs(n_jobs=4)

When using SpikeInterface from April 17 (commit hash 33d478a) I can see that only four cores are being used while export_to_phy() is running.
fittingpca_Apr17

However, when using SpikeInterface from May 6 (commit hash 42e1f02) the "Fitting PCA" step is saturating all cores:
fittingpca_may6

Thanks,
Chris
chris@plexon.com

@zm711
Copy link
Collaborator

zm711 commented May 7, 2024

Maybe related to #2696...

Thanks for the report. I'll see if I can track this down.

EDIT: Although @alejoe91 @samuelgarcia did PCA ever respect the number of cores for n_jobs?

EDIT2: Looks like it does use n_jobs. Although there might be a separate bug here

if isinstance(n_jobs, float):
n_jobs = int(n_jobs * os.cpu_count())
elif n_jobs < 0:
n_jobs = os.cpu_count() + 1 + n_jobs
job_kwargs["n_jobs"] = max(n_jobs, 1)

if n_jobs=4.0 is given then it would do the following:

n_jobs = int(4.0 * os.cpu_count())

which could max out the n_jobs.

@cheydrick did you enter 4 as a float or as an int for your script?

@zm711 zm711 added exporters Related to exporters module concurrency Related to parallel processing labels May 7, 2024
@cheydrick
Copy link
Author

@zm711 it's an int.

si.set_global_job_kwargs(n_jobs=4)

@zm711
Copy link
Collaborator

zm711 commented May 7, 2024

Could you do me a favor and run with verbose=True inside the export_to_phy. There has been some work on verbose so it might not work but it would be nice to see the max amount of information. Specifically to see how many jobs are being passed to the ChunkRecordingExecutor.

@zm711
Copy link
Collaborator

zm711 commented May 7, 2024

The other thing we may need to look at is this line where we send the sorting.to_multiprocessing. It seems like NumpySorting would be converted to SharedMem here. Is that desired?

func = _all_pc_extractor_chunk
init_func = _init_work_all_pc_extractor
init_args = (
recording,
sorting.to_multiprocessing(job_kwargs["n_jobs"]),
all_pcs_args,
waveforms_ext.nbefore,
waveforms_ext.nafter,
unit_channels,
pca_model,

def to_multiprocessing(self, n_jobs):
"""
When necessary turn sorting object into:
* NumpySorting when n_jobs=1
* SharedMemorySorting when n_jobs>1
If the sorting is already NumpySorting, SharedMemorySorting or NumpyFolderSorting
then this return the sortign itself, no transformation so.
Parameters
----------
n_jobs: int
The number of jobs.
Returns
-------
sharable_sorting:
A sorting that can be used for multiprocessing.
"""
from .numpyextractors import NumpySorting, SharedMemorySorting
from .sortingfolder import NumpyFolderSorting
if n_jobs == 1:
if isinstance(self, (NumpySorting, SharedMemorySorting, NumpyFolderSorting)):
return self
else:
return NumpySorting.from_sorting(self)
else:
if isinstance(self, (SharedMemorySorting, NumpyFolderSorting)):
return self
else:
return SharedMemorySorting.from_sorting(self)

@cheydrick
Copy link
Author

@zm711 Looks like verbose is True by default.

I explicitly set it to True anyways, but the output was the same as in my screenshots above.

@zm711
Copy link
Collaborator

zm711 commented May 8, 2024

Unfortunately (and this is why verbose needs to be fixed), is that that verbose is just for the function and not at a global level passed into the other functions that accept verbose. So that's why it doesn't do what I wanted. I'll try to dig a bit more by running the PCA by itself and see if I can see what is going on.

@zm711
Copy link
Collaborator

zm711 commented May 8, 2024

I think we have to wait for @samuelgarcia to comment on this one. But based on my reading the n_jobs being fed into the ProcessPoolExecutor is just a max number of processes. So technically the OS can move those processes around however it wants (for example we could also set n_jobs > number of cores and it would just have to schedule those extra processes onto the processors). This doesn't explain why the switch occurred with changing to NumpySorting from SharedMemSorting, but n_jobs cannot absolutely be guaranteed to be equal to the cores since the OS has a say in this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
concurrency Related to parallel processing exporters Related to exporters module
Projects
None yet
Development

No branches or pull requests

3 participants