Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVIDIA Cuda Validation did not detect Nvidia for Tesla T4 15GB #771

Open
Hzzkygcs opened this issue Jan 20, 2024 · 0 comments
Open

NVIDIA Cuda Validation did not detect Nvidia for Tesla T4 15GB #771

Hzzkygcs opened this issue Jan 20, 2024 · 0 comments

Comments

@Hzzkygcs
Copy link

Hi.
I'm trying to run phoronix-test-suite install pytorch-1.0.1 on the Google Cloud Platform VM instance with support for GPU (Nvidia Tesla T4 15GB). However, I did not see the option to choose between CPU vs Cuda. I'm always forced to use the CPU. Phoronix did not ask me which hardware to use (GPU vs CPU).

I have made sure that PyTorch 1.0.1 supports Nvidia in its test-definition.xml. After spending some time reading the Phoronix code I think the simple validation at

if((stripos($test_args, 'NVIDIA ') !== false || stripos($test_args . ' ', 'CUDA ') !== false) && stripos(phodevi::read_property('gpu', 'model'), 'NVIDIA') === false)
{
// Only show NVIDIA / CUDA options when running with NVIDIA hardware
$error = 'NVIDIA CUDA support is not available.';
return false;
}
if((stripos($test_args, 'NVIDIA ') !== false || stripos($test_args . ' ', 'CUDA ') !== false) && stripos(phodevi::read_property('gpu', 'model'), 'NVIDIA') === false)
{
// Only show NVIDIA / CUDA options when running with NVIDIA hardware
$error = 'NVIDIA support is not available.';
return false;
}
doesn't work on my case.

I tried to print out the phodevi::read_property('gpu', 'model') of my VM instance, and it yields Tesla T4 15GB which does not contain substring NVIDIA in it, even though it's also an Nvidia GPU with CUDA support.

Some solutions I propose to this issue are:

  • Add alternative validation. If the substring "nvidia" is not found, then try to run command nvidia-smi and see if it returns "command not found error" or not.
  • Add an option to disable all validation entirely (which may not be an ideal solution, but easier to implement)
  • Rely on other properties of the GPU besides the "model" property

However, I believe there may be some more ideal and better solutions than my solutions. Should that be the case, feel free to use the better one

@Hzzkygcs Hzzkygcs changed the title Edge Case for NVIDIA Cuda Validation NVIDIA Cuda Validation did not detect Nvidia for Tesla T4 15GB Jan 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant