Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA memory leak? #1230

Closed
SeparateReality opened this issue Apr 11, 2017 · 3 comments
Closed

CUDA memory leak? #1230

SeparateReality opened this issue Apr 11, 2017 · 3 comments

Comments

@SeparateReality
Copy link

from torch.autograd import Variable
import torch
import torch.nn as nn
import torch.backends.cudnn as cudnn
cudnn.benchmark = True

import sys
print('__Python VERSION:', sys.version)
print('__pyTorch VERSION:', torch.__version__)
print('__CUDA VERSION')
from subprocess import call
call(["nvcc", "--version"])
print('__CUDNN VERSION:', torch.backends.cudnn.version())
print('__Number CUDA Devices:', torch.cuda.device_count())
print('__Devices')
call(["nvidia-smi", "--format=csv", "--query-gpu=index,name,driver_version,memory.total,memory.used,memory.free"])
print('Active CUDA Device: GPU', torch.cuda.current_device())
# print('  Try to change to Device 2 - with "torch.cuda.device(2)"')
# torch.cuda.device(2)
# print('  ! Active CUDA Device is still:', torch.cuda.current_device())
#
# print('  Try again with environment vars')
# import os
# os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"   # see issue #152
# os.environ["CUDA_VISIBLE_DEVICES"]="2"
# print('  ! Active CUDA Device is still:', torch.cuda.current_device())

import time
from torch.nn import Conv1d as Conv1d

num_runs = 10
s = 5*22050

print('\n')
for seqlen in [s]:
    for batch_size in [16, 32]:
        for dilation in reversed([64, 128, 256, 512]):
            m = nn.Sequential(Conv1d(32, 32, kernel_size=2, dilation=dilation),
                              Conv1d(32, 32, kernel_size=2, dilation=dilation),
                              Conv1d(32, 32, kernel_size=2, dilation=dilation),
                              Conv1d(32, 32, kernel_size=2, dilation=dilation),
                              Conv1d(32, 32, kernel_size=2, dilation=dilation)).cuda()
            input = torch.randn(batch_size, 32, seqlen).float().cuda()

            torch.cuda.synchronize()
            start = time.time()
            for j in range(num_runs):
                output = m(Variable(input, requires_grad=True))
                output.backward(output.data)
            torch.cuda.synchronize()
            mean_time = (time.time() - start) / float(num_runs)
            print('batch_size: %i\tdilation: %i\tseqlen: %i\t time %f\t runs: %i' %(batch_size, dilation, seqlen, mean_time, num_runs))

Output:

__Python VERSION: 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 12:22:00) 
__pyTorch VERSION: 0.1.11+8aa1cef
__CUDA VERSION
Cuda compilation tools, release 8.0, V8.0.61
__CUDNN VERSION: 6020
__Number CUDA Devices: 4
__Devices
index, name, driver_version, memory.total [MiB], memory.used [MiB], memory.free [MiB]
0, GeForce GTX 1080 Ti, 381.09, 11158 MiB, 318 MiB, 10840 MiB
1, GeForce GTX 1080 Ti, 381.09, 11172 MiB, 11 MiB, 11161 MiB
2, GeForce GTX 1080 Ti, 381.09, 11172 MiB, 11 MiB, 11161 MiB
3, GeForce GTX 1080 Ti, 381.09, 11172 MiB, 11 MiB, 11161 MiB
Active CUDA Device: GPU 0

batch_size: 16	dilation: 512	seqlen: 110250	 time 0.204314	 runs: 10
batch_size: 16	dilation: 256	seqlen: 110250	 time 0.162138	 runs: 10
batch_size: 16	dilation: 128	seqlen: 110250	 time 0.148690	 runs: 10
batch_size: 16	dilation: 64	seqlen: 110250	 time 0.141783	 runs: 10
batch_size: 32	dilation: 512	seqlen: 110250	 time 0.279548	 runs: 10
Traceback (most recent call last):
  File "benchmark_test.py", line 48, in <module>
    output = m(Variable(input, requires_grad=True))
  File "/home/USERNAME/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/USERNAME/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 64, in forward
    input = module(input)
  File "/home/USERNAME/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/USERNAME/anaconda3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 143, in forward
    self.padding, self.dilation, self.groups)
  File "/home/USERNAME/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 62, in conv1d
    return f(input, weight, bias)
RuntimeError: CUDNN_STATUS_ALLOC_FAILED

I was wondering why I got the CUDNN_STATUS_ALLOC_FAILED.
After some experiments I found out that - error or not - depends on the sequence in the dilation list:
line 37: for dilation in reversed([64, 128, 256, 512]):
Execution without reversed goes without error.

I am not yet familiar with the whole thing. I am I missing something?

--
I thankfully adapted this code from #967.
By the way: I am also curious why I can’t change the active CUDA device (see comment in the code)… but I probably just need to get more into it…

@SeparateReality SeparateReality changed the title CUDA memory leak CUDA memory leak? Apr 11, 2017
@ngimel
Copy link
Collaborator

ngimel commented Apr 11, 2017

Please try running with cudnn.benchmark=False. cudnn.benchmark is not doing any good for dilated convolutions anyway.

@albanD
Copy link
Collaborator

albanD commented Apr 11, 2017

Hi,
You should use torch.cuda.set_device(2) to change the GPU you use or:

with torch.cuda.device(2):
    # do stuff on gpu 2
# Back on the default GPU

Also the CUDA_VISIBLE_DEVICES will only affect the execution if it is set before starting the script.

@SeparateReality
Copy link
Author

Wow, thank you for the quick and very precise answers!

@ngimel: Never thought of that... Using cudnn.benchmark=False did the trick. No matter what size of the list I tried. The error did not show again.

@albanD: That did it, as well! I did not use set_device() because its doc states its discouraged in favor of device(). Strange.

fsx950223 pushed a commit to fsx950223/pytorch that referenced this issue Mar 26, 2024
* Add triton build scripts for docker build

* Match permissions to upstream

* Update triton commit to tip of ROCm triton's release/pytorch_2.0 branch

* typo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants