-
Notifications
You must be signed in to change notification settings - Fork 340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Could not find executable nvidia-smi
" for ./configure.py --backend=CUDA
#12017
Comments
Could not find executable
nvidia-smi for
./configure.py --backend=CUDA`Could not find executable
nvidia-smi` for
./configure.py --backend=CUDA`
Could not find executable
nvidia-smi` for
./configure.py --backend=CUDA`Could not find executable \
nvidia-smi\` for
./configure.py --backend=CUDA`
Could not find executable \
nvidia-smi\` for
./configure.py --backend=CUDA`nvidia-smi
" for ./configure.py --backend=CUDA
Yeah, agreed. That should be possible. I would try bypassing the configure script and directly call Bazel in your container. I believe |
thanks. That built, but my CUDA tests are still failing |
Can you share the log? Also, didn't you say you only wanted to build things? For running the tests you will need a GPU. |
here's the log
with PJRT_Error message
It occurs on
sort of. I am ultimately running it in a GPU environment, but before that, I'm building it without a GPU. I have tried several different environments and several different argument sets to |
That should work. Unfortunately a failing cuDNN initialization can have a whole lot of reasons. I would recommend enabling debug logging for cuDNN. This can be done by setting some environment variables, see here: https://docs.nvidia.com/deeplearning/cudnn/latest/reference/troubleshooting.html. (Note that the environment variables changed with cuDNN 9.0.0, so if you use cuDNN prior to version 9.0.0 you need to check the docs for your version in the NVIDIA docs archive) I can't really comment on PJRT and its options. I'm not an expert on that. |
It's working! God that took too long. Thanks for your help. I ran it in the same container I built it in (with extra stuff). I assume that means there's a missing package or version conflict. I'll investigate, but seeing it working is extremely promising. I'll leave this ticket open so the docs and/or configure.py are updated, but I may be able to do the rest from here |
When I run this line in the tensorflow docker container
./configure.py --backend=CUDA
, as specified in the docs, I getI don't see this error if I specify
--gpus all
in thedocker run
command. I believe using that option requires nvidia container runtime.I guess the docs need updating, but I'd like to build XLA targets (specifically the CUDA PJRT plugin) without access to GPUs, or the nvidia container runtime, so I can build it in GitHub actions. My minimal understanding of CUDA says this should be possible.
Do I even need to run
./configure.py
for that target? I have so far been unable to get the CUDA plugin working, and wonder if this error might be the problem.The text was updated successfully, but these errors were encountered: