unable to create cuda shared memory handle
when using multiprocessing to send multiple requests
#7101
Labels
unable to create cuda shared memory handle
when using multiprocessing to send multiple requests
#7101
Description
I use
multiprocessing
to send multiple requests to the Triton clients. When I use CUDA shm, even with only 1 process, it results in an initialization errorThe CUDA shm implementation follows closely the given example, and it works perfectly when not using
multiprocessing
.Triton Information
Triton Docker 22.12
To Reproduce
This is how the inference with CUDA shm is implemented.
The multiprocessing part is implemented as above.
model.warmup
sends random inputs to the server. The error above happens when hittinginput0_shm_handle = cudashm.create_shared_memory_region(input0_data, input0_byte_size, 0)
.Expected behavior
I expected the code should run normally.
Please let me know what I'm missing here. Thanks in advance!
Update 1: The program cannot get/set device in subprocess. It dies here
prev_device = call_cuda_function(cudart.cudaGetDevice)
increate_shared_memory_region
.The text was updated successfully, but these errors were encountered: