Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU usage at 0% during training #72

Open
nub2927 opened this issue Dec 6, 2022 · 7 comments
Open

GPU usage at 0% during training #72

nub2927 opened this issue Dec 6, 2022 · 7 comments

Comments

@nub2927
Copy link

nub2927 commented Dec 6, 2022

So i see it is taking up VRAM but im not seeing in windows 10 that it is using the GPU
I am using a 4090 cuda V11.8.89 pytorch 1.12.1 python 3.9

image

@victorca25
Copy link
Owner

victorca25 commented Dec 6, 2022

Hello! Can you share your configuration file? And if possible, the training log file as well

@nub2927
Copy link
Author

nub2927 commented Dec 6, 2022

Hello! Can you share your configuration file? And if possible, the training log file as well

config:
https://pastebin.com/NvutbgaN

training log:
https://pastebin.com/dvdDqgtY

@nub2927
Copy link
Author

nub2927 commented Dec 11, 2022

ok this may have actually been just been an issue with windows
running training script from on xinntao's repo for real-ESRGAN did similar things
and using a third party monitor showed diffrent results from windows.
image

@nub2927
Copy link
Author

nub2927 commented Dec 11, 2022

just to assure im not getting a bottleneck somewhere though,
typically how fast would training be if gpu is working correctly on data that is 512 by 512?

@Kim2091
Copy link
Contributor

Kim2091 commented Dec 11, 2022

You need to monitor CUDA in task manager, rather than 3D. Do this by clicking where it says 3D (the text) and selecting CUDA from the dropdown. If it's not there, disable Hardware-Accelerated GPU Scheduling in Windows settings.

image

@victorca25
Copy link
Owner

@nub2927 sorry for late reply, but the configuration and logs look correct!
As Kim mentions, only the CUDA pipeline is used when using PyTorch models with the GPU.

I don't have the numbers from a 4090, but training speed is fast from what I can see in the logs. Any bottleneck you may find could require changing the code to optimize the parts that are handled on CPU (like the images pipeline, etc), but at least it looks good on the GPU side.

@NickDeBeenSAE
Copy link

I am having a problem just starting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants