Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpu_usage @ trainer.py #61

Open
kevinroccapriore opened this issue Apr 1, 2022 · 1 comment
Open

gpu_usage @ trainer.py #61

kevinroccapriore opened this issue Apr 1, 2022 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@kevinroccapriore
Copy link

Line 357 in trainer.py
gpu_usage = gpu_usage_map(torch.cuda.current_device())
may result in FileNotFoundError.

Work-around is to just use Try Except block to bypass error.
Windows may view as unsafe command.

@ziatdinovmax ziatdinovmax added the bug Something isn't working label Apr 1, 2022
@ziatdinovmax ziatdinovmax self-assigned this Oct 6, 2022
@markcoletti
Copy link

To follow-up on this bug, gpu_usage_map (which is in utils/nn.py) relies on nvidia-smi to get this information, and which may not be available. Which is the case on the Oak Ridge National Laboratory's Frontier supercomputer, which uses AMD GPUs, not Nvidia.

Though the equivalent for that call is rocm-smi, it may be best to rely on a third party package that is OEM agnostic and just returns GPU usage regardless the flavor. I recommend using something like Ricks-Lab GPU Utilities
(https://pypi.org/project/rickslab-gpu-utils/) that can work with both AMD and Nvidia GPUs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants