Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONEAPI_DEVICE_SELECTOR not working under sub-process #1406

Open
harborn opened this issue Sep 19, 2023 · 4 comments
Open

ONEAPI_DEVICE_SELECTOR not working under sub-process #1406

harborn opened this issue Sep 19, 2023 · 4 comments

Comments

@harborn
Copy link

harborn commented Sep 19, 2023

My environment:

export ONEAPI_DEVICE_SELECTOR="level_zero:gpu"
$ sycl
Warning: ONEAPI_DEVICE_SELECTOR environment variable is set to level_zero:gpu.
To see the correct device id, please unset ONEAPI_DEVICE_SELECTOR.

[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.26241]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.26241]
[ext_oneapi_level_zero:gpu:2] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.26241]
[ext_oneapi_level_zero:gpu:3] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.26241]
[ext_oneapi_level_zero:gpu:4] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.26241]
[ext_oneapi_level_zero:gpu:5] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.26241]

And I have following codes in file test_multi_process.py:

import os
from multiprocessing import Pool
import time
import dpctl

env_var = "ONEAPI_DEVICE_SELECTOR"
backend = "level_zero"
device_type = "gpu"


def get_dev():
    dev_ids = []
    for dev in dpctl.get_devices(backend=backend, device_type=device_type):
        # device filter_string with format: "backend:device_type:relative_id"
        # dev_ids.append(int(dev.filter_string.split(":")[-1]))
        dev_ids.append(dev.filter_string)
    return dev_ids


def set_env_var(dev_id):
    env_val = f"{backend}:{dev_id}"
    print(f"[func] [{os.getpid()}] set {env_var} = {env_val}")
    os.environ[env_var] = env_val


def func(x):
    print(f"[func] [{os.getpid()}] >>>>>>>>>>>>>>>>>>>>>>")
    set_env_var(x)
    env_val = os.environ.get(env_var, None)
    print(f"[func] [{os.getpid()}] x = {x}") 
    print(f"[func] [{os.getpid()}] {env_var} = {env_val}")
    dev_ids = get_dev()
    print(f"[func] [{os.getpid()}] dev_ids = {dev_ids}")
    print(f"[func] [{os.getpid()}] <<<<<<<<<<<<<<<<<<<<<<\n") 


def main():
    env_val = os.environ.get(env_var, None)
    print(f"[main] [{os.getpid()}] {env_var} = {env_val}")
    dev_ids = get_dev()
    print(f"[main] [{os.getpid()}] dev_ids = {dev_ids}")
    with Pool(5) as p:
        p.map(func, [1, 2, 3, 4, 5])

                                                                                                                                                                                                                                                               
if __name__ == '__main__':
    main()

Running test_multi_process.py

python test_multi_process.py

With following logs:

[main] [122421] ONEAPI_DEVICE_SELECTOR = level_zero:gpu
[main] [122421] dev_ids = ['level_zero:gpu:0', 'level_zero:gpu:1', 'level_zero:gpu:2', 'level_zero:gpu:3', 'level_zero:gpu:4', 'level_zero:gpu:5']

# sub process 1
[func] [122440] >>>>>>>>>>>>>>>>>>>>>>
[func] [122440] set ONEAPI_DEVICE_SELECTOR = level_zero:1
[func] [122440] x = 1
[func] [122440] ONEAPI_DEVICE_SELECTOR = level_zero:1
[func] [122440] dev_ids = ['level_zero:gpu:0', 'level_zero:gpu:1', 'level_zero:gpu:2', 'level_zero:gpu:3', 'level_zero:gpu:4', 'level_zero:gpu:5']
[func] [122440] <<<<<<<<<<<<<<<<<<<<<<

# sub process 2
[func] [122441] >>>>>>>>>>>>>>>>>>>>>>
[func] [122441] set ONEAPI_DEVICE_SELECTOR = level_zero:2
[func] [122441] x = 2
[func] [122441] ONEAPI_DEVICE_SELECTOR = level_zero:2
[func] [122441] dev_ids = ['level_zero:gpu:0', 'level_zero:gpu:1', 'level_zero:gpu:2', 'level_zero:gpu:3', 'level_zero:gpu:4', 'level_zero:gpu:5']
[func] [122441] <<<<<<<<<<<<<<<<<<<<<<

# sub process 3
[func] [122442] >>>>>>>>>>>>>>>>>>>>>>
[func] [122442] set ONEAPI_DEVICE_SELECTOR = level_zero:3
[func] [122442] x = 3
[func] [122442] ONEAPI_DEVICE_SELECTOR = level_zero:3
[func] [122442] dev_ids = ['level_zero:gpu:0', 'level_zero:gpu:1', 'level_zero:gpu:2', 'level_zero:gpu:3', 'level_zero:gpu:4', 'level_zero:gpu:5']
[func] [122442] <<<<<<<<<<<<<<<<<<<<<<

# sub process 4
[func] [122443] >>>>>>>>>>>>>>>>>>>>>>
[func] [122443] set ONEAPI_DEVICE_SELECTOR = level_zero:4
[func] [122443] x = 4
[func] [122443] ONEAPI_DEVICE_SELECTOR = level_zero:4
[func] [122443] dev_ids = ['level_zero:gpu:0', 'level_zero:gpu:1', 'level_zero:gpu:2', 'level_zero:gpu:3', 'level_zero:gpu:4', 'level_zero:gpu:5']
[func] [122443] <<<<<<<<<<<<<<<<<<<<<<

# sub process 5
[func] [122444] >>>>>>>>>>>>>>>>>>>>>>
[func] [122444] set ONEAPI_DEVICE_SELECTOR = level_zero:5
[func] [122444] x = 5
[func] [122444] ONEAPI_DEVICE_SELECTOR = level_zero:5
[func] [122444] dev_ids = ['level_zero:gpu:0', 'level_zero:gpu:1', 'level_zero:gpu:2', 'level_zero:gpu:3', 'level_zero:gpu:4', 'level_zero:gpu:5']
[func] [122444] <<<<<<<<<<<<<<<<<<<<<<

In the main process, I want to use GPU ID [1,2,3,4,5], and create 5 process to use one of the GPUs in each process.

My question is that:
I have set the environment variable ONEAPI_DEVICE_SELECTOR in each sub-process with using only one GPU, but the sub-process could still can see the 6 GPUs.
Is that need reload the dpcpp in python codes?
Or that ONEAPI_DEVICE_SELECTOR with dpctl can't be used in nested case?
Or that ONEAPI_DEVICE_SELECTOR should work with command syc-ls?

@oleksandr-pavlyk
Copy link
Collaborator

Please be aware that ONEAPI_DEVICE_SELECTOR string specs and filter-selector string specs are not exactly aligned, and that may be playing tricks on your experiment. See https://intel.github.io/llvm-docs/EnvironmentVariables.html#oneapi-device-selector for the complete specification.

The ONEAPI_DEVICE_SELECTOR requires you to specify the backend and the device type, so I'd expect

$ ONEAPI_DEVICE_SELECTOR=level_zero:gpu:0 sycl-ls

to only show one device, while the syntax used in your script to show all devices, because the incorrectly formed string is ignored:

$ ONEAPI_DEVICE_SELECTOR=level_zero:0 sycl-ls

@harborn
Copy link
Author

harborn commented Sep 20, 2023

Please be aware that ONEAPI_DEVICE_SELECTOR string specs and filter-selector string specs are not exactly aligned, and that may be playing tricks on your experiment. See https://intel.github.io/llvm-docs/EnvironmentVariables.html#oneapi-device-selector for the complete specification.

The ONEAPI_DEVICE_SELECTOR requires you to specify the backend and the device type, so I'd expect

$ ONEAPI_DEVICE_SELECTOR=level_zero:gpu:0 sycl-ls

to only show one device, while the syntax used in your script to show all devices, because the incorrectly formed string is ignored:

$ ONEAPI_DEVICE_SELECTOR=level_zero:0 sycl-ls

Maybe you don't have test the usage of ONEAPI_DEVICE_SELECTOR

$ ONEAPI_DEVICE_SELECTOR=level_zero:gpu:0 sycl-ls
Warning: ONEAPI_DEVICE_SELECTOR environment variable is set to level_zero:gpu:0.
To see the correct device id, please unset ONEAPI_DEVICE_SELECTOR.

SYCL Exception encountered: Error parsing selector string "level_zero:gpu:0"  Too many colons (:)
$ ONEAPI_DEVICE_SELECTOR=level_zero:gpu:1,3 sycl-ls
Warning: ONEAPI_DEVICE_SELECTOR environment variable is set to level_zero:gpu:1,3.
To see the correct device id, please unset ONEAPI_DEVICE_SELECTOR.

SYCL Exception encountered: Error parsing selector string "level_zero:gpu:1,3"  Too many colons (:)

@harborn
Copy link
Author

harborn commented Sep 20, 2023

Please be aware that ONEAPI_DEVICE_SELECTOR string specs and filter-selector string specs are not exactly aligned, and that may be playing tricks on your experiment. See https://intel.github.io/llvm-docs/EnvironmentVariables.html#oneapi-device-selector for the complete specification.

The ONEAPI_DEVICE_SELECTOR requires you to specify the backend and the device type, so I'd expect

$ ONEAPI_DEVICE_SELECTOR=level_zero:gpu:0 sycl-ls

to only show one device, while the syntax used in your script to show all devices, because the incorrectly formed string is ignored:

$ ONEAPI_DEVICE_SELECTOR=level_zero:0 sycl-ls

Do you ever use the environment variable ONEAPI_DEVICE_SELECTOR to filter or use specific device?

@xwu99
Copy link

xwu99 commented Sep 20, 2023

@oleksandr-pavlyk I faced a similar situation, when modifying ONEAPI_DEVICE_SELECTOR after import dpctl, the recalling of get_devices doesn't respect the modified filter.

import dpctl
import os

print("First import dpctl")  # this will print all gpu devices
os.environ["ONEAPI_DEVICE_SELECTOR"]="level_zero:gpu"
for d in dpctl.get_devices():
    d.print_device_info()

print("============")

print("Does not work on the same process") # this will not print selected device
os.environ["ONEAPI_DEVICE_SELECTOR"]="level_zero:1"
import dpctl
for d in dpctl.get_devices():
    d.print_device_info()

print("Works on another process")
import subprocess
os.environ["ONEAPI_DEVICE_SELECTOR"]="level_zero:1"
subprocess.run(["python", "process.py"], capture_output=False)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants