Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA + multiprocessing issue #404

Open
hugokitano opened this issue Mar 8, 2022 · 2 comments
Open

CUDA + multiprocessing issue #404

hugokitano opened this issue Mar 8, 2022 · 2 comments
Labels
help wanted Extra attention is needed

Comments

@hugokitano
Copy link
Contributor

Describe the bug
A Exception cudaErrorInitializationError: initialization error occurs within the multiprocessing pool when using GPU/CUDA on two or more files. This happens in the feature_finding step but could potentially affect any time CuPY is used within the entire workflow.

To Reproduce
Environment: nvcc** --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

Using cupy-cuda115==10.2.0.

Script: following the convention described by test_gpu_.py,

def main():
    global alphapept
    alphapept.performance.set_compilation_mode('cuda')
    alphapept.performance.set_worker_count(30)
    importlib.reload(alphapept.feature_finding)

    settings = load_settings('/home/ubuntu/apps/alphapept/test_settings.yaml')
    r =  alphapept.interface.import_raw_data(settings)
    r = alphapept.interface.feature_finding(settings)

where test_settings.yaml is all the defaults, with two or more files in experiment/file_paths

Error
For three separate files

022-03-08 18:57:55> No *.hdf file with features found for /mnt/EXP21155/EXP21155_2021ms0603X7_A.ms_data.hdf. Adding to feature finding list.
2022-03-08 18:57:55> Feature finding on /mnt/EXP21155/EXP21155_2021ms0603X7_A.raw
2022-03-08 18:57:55> Hill extraction with centroid_tol 8 and max_gap 2
2022-03-08 18:57:55> Feature finding of file /mnt/EXP21155/EXP21155_2021ms0603X7_A.raw failed. Exception cudaErrorInitializationError: initialization error
2022-03-08 18:57:55> Processing of /mnt/EXP21155/EXP21155_2021ms0603X7_A.raw for step find_features failed. Exception cudaErrorInitializationError: initialization error
2022-03-08 18:57:55> No *.hdf file with features found for /mnt/EXP21155/EXP21155_2021ms0609X26_A.ms_data.hdf. Adding to feature finding list.
2022-03-08 18:57:56> Feature finding on /mnt/EXP21155/EXP21155_2021ms0609X26_A.raw
2022-03-08 18:57:56> Hill extraction with centroid_tol 8 and max_gap 2
2022-03-08 18:57:56> Feature finding of file /mnt/EXP21155/EXP21155_2021ms0609X26_A.raw failed. Exception cudaErrorInitializationError: initialization error
2022-03-08 18:57:56> Processing of /mnt/EXP21155/EXP21155_2021ms0609X26_A.raw for step find_features failed. Exception cudaErrorInitializationError: initialization error

A Solution?
After some research, I was able to find the source of the problem. The combination of multiprocessing pools and CUDA is a little tricky. In short, we cannot use the CuPY API before we spawn processes. I'm not exactly sure where this happens in the code given, but I expect it's in some of the settings management. The solution I found was to set multiprocessing.set_start_method('spawn') ('forkserver' also works).

The speed and stability of the three options is up for debate, and I'm not sure if we will be able to obtain performance advantages using GPU if we cannot fork processes. I'm not an expert on multiprocessing, though.

Would like to know if you can replicate this problem and suggest a fix. Thank you.

@Jude-Zheng
Copy link

hi hugokitano!
my system is Ubuntu 20 4.I have the same problem. Have you solved it?

@straussmaximilian
Copy link
Member

Hi,
I had never tested analyzing multiple files on GPU, so this could indeed be an issue, and this potentially will not work out of the box. Historically, the GPU part started with how to improve performance on a single file. The use case here could be to launch multiple docker instances on single files and then combine them later in another instance.

However, if anyone has good ideas to get the multiprocessing to work or wants to tackle this, I am all ears.

@straussmaximilian straussmaximilian added the help wanted Extra attention is needed label Apr 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants