Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA out of memory. Tried to allocate 12.50 MiB (GPU 0; 10.92 GiB total capacity; 8.57 MiB already allocated; 9.28 GiB free; 4.68 MiB cached) #16417

Closed
EMarquer opened this issue Jan 27, 2019 · 165 comments
Labels
needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user

Comments

@EMarquer
Copy link

EMarquer commented Jan 27, 2019

CUDA Out of Memory error but CUDA memory is almost empty

I am currently training a lightweight model on very large amount of textual data (about 70GiB of text).
For that I am using a machine on a cluster ('grele' of the grid5000 cluster network).

I am getting after 3h of training this very strange CUDA Out of Memory error message:
RuntimeError: CUDA out of memory. Tried to allocate 12.50 MiB (GPU 0; 10.92 GiB total capacity; 8.57 MiB already allocated; 9.28 GiB free; 4.68 MiB cached).
According to the message, I have the required space but it does not allocate the memory.

Any idea what might cause this ?

For information, my preprocessing relies on torch.multiprocessing.Queue and an iterator over the lines of my source data to preprocess the data on the fly.

Full stacktrace

Traceback (most recent call last):
  File "/home/emarquer/miniconda3/envs/pytorch/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/emarquer/miniconda3/envs/pytorch/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/emarquer/miniconda3/envs/pytorch/lib/python3.6/site-packages/memory_profiler.py", line 1228, in <module>
    exec_with_profiler(script_filename, prof, args.backend, script_args)
  File "/home/emarquer/miniconda3/envs/pytorch/lib/python3.6/site-packages/memory_profiler.py", line 1129, in exec_with_profiler
    exec(compile(f.read(), filename, 'exec'), ns, ns)
  File "run.py", line 293, in <module>
    main(args, save_folder, load_file)
  File "run.py", line 272, in main
    trainer.all_epochs()
  File "/home/emarquer/papud-bull-nn/trainer/trainer.py", line 140, in all_epochs
    self.single_epoch()
  File "/home/emarquer/papud-bull-nn/trainer/trainer.py", line 147, in single_epoch
    tracker.add(*self.single_batch(data, target))
  File "/home/emarquer/papud-bull-nn/trainer/trainer.py", line 190, in single_batch
    result = self.model(data)
  File "/home/emarquer/miniconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/emarquer/papud-bull-nn/model/model.py", line 54, in forward
    emb = self.emb(input)
  File "/home/emarquer/miniconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/emarquer/miniconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/sparse.py", line 118, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/home/emarquer/miniconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/functional.py", line 1454, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA out of memory. Tried to allocate 12.50 MiB (GPU 0; 10.92 GiB total capacity; 8.57 MiB already allocated; 9.28 GiB free; 4.68 MiB cached)

@OmarBazaraa
Copy link

I have the same runtime error:

Traceback (most recent call last):
  File "carn\train.py", line 52, in <module>
    main(cfg)
  File "carn\train.py", line 48, in main
    solver.fit()
  File "C:\Users\Omar\Desktop\CARN-pytorch\carn\solver.py", line 95, in fit
    psnr = self.evaluate("dataset/Urban100", scale=cfg.scale, num_step=self.step)
  File "C:\Users\Omar\Desktop\CARN-pytorch\carn\solver.py", line 136, in evaluate
    sr = self.refiner(lr_patch, scale).data
  File "C:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\Omar\Desktop\CARN-pytorch\carn\model\carn.py", line 74, in forward
    b3 = self.b3(o2)
  File "C:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\Omar\Desktop\CARN-pytorch\carn\model\carn.py", line 30, in forward
    c3 = torch.cat([c2, b3], dim=1)
RuntimeError: CUDA out of memory. Tried to allocate 195.25 MiB (GPU 0; 4.00 GiB total capacity; 2.88 GiB already allocated; 170.14 MiB free; 2.00 MiB cached)

@yf225
Copy link
Contributor

yf225 commented Jan 28, 2019

@EMarquer @OmarBazaraa Could you give a minimal repro example that we can run?

@yf225 yf225 added the needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user label Jan 28, 2019
@EMarquer
Copy link
Author

I can not reproduce the problem anymore, thus I will close the issue.
The problem disappeared when I stopped storing the preprocessed data in RAM.

@OmarBazaraa, I do not think your problem is the same as mine, as:

  • I am trying to allocate 12.50 MiB, with 9.28 GiB free
  • you are trying to allocate 195.25 MiB, with 170.14 MiB free

From my previous experience with this problem, either you do not free the CUDA memory or you try to put too much data on CUDA.
By not freeing the CUDA memory, I mean you potentially still have references to tensors in CUDA that you do not use anymore. Those would prevent the allocated memory from being freed by deleting the tensors.

@aniketspurohit
Copy link

Is there any general solution?

CUDA out of memory. Tried to allocate 196.00 MiB (GPU 0; 2.00 GiB total capacity; 359.38 MiB already allocated; 192.29 MiB free; 152.37 MiB cached)

@fmassa
Copy link
Member

fmassa commented Jan 31, 2019

@aniks23 we are working on a patch that I believe will give better experience in this case. Stay tuned

@aniketspurohit
Copy link

aniketspurohit commented Feb 1, 2019 via email

@adrianovieira
Copy link

adrianovieira commented Mar 12, 2019

I also got this message:

RuntimeError: CUDA out of memory. Tried to allocate 32.75 MiB (GPU 0; 4.93 GiB total capacity; 3.85 GiB already allocated; 29.69 MiB free; 332.48 MiB cached)

It happened when I was trying to run the Fast.ai lesson1 Pets https://course.fast.ai/ (cell 31)

@treble-maker123
Copy link

treble-maker123 commented Mar 14, 2019

I too am running into the same errors. My model was working earlier with the exact setup, but now it's giving this error after I modified some seemingly unrelated code.

RuntimeError: CUDA out of memory. Tried to allocate 1.34 GiB (GPU 0; 22.41 GiB total capacity; 11.42 GiB already allocated; 59.19 MiB free; 912.00 KiB cached)

@treble-maker123
Copy link

I don't know if my scenario is relatable to the original issue, but I resolved my problem (the OOM error in the previous message went away) by breaking up the nn.Sequential layers in my model, e.g.

self.input_layer = nn.Sequential(
    nn.Conv3d(num_channels, 32, kernel_size=3, stride=1, padding=0),
    nn.BatchNorm3d(32),
    nn.ReLU()
)

output = self.input_layer(x)

to

self.input_conv = nn.Conv3d(num_channels, 32, kernel_size=3, stride=1, padding=0)
self.input_bn = nn.BatchNorm3d(32)

output = F.relu(self.input_bn(self.input_conv(x)))

My model has a lot more of these (5 more to be exact). Am I using nn.Sequential right? Or is this a bug? @yf225 @fmassa

@yasheshgaur
Copy link

I am getting a similar error as well:

CUDA out of memory. Tried to allocate 196.50 MiB (GPU 0; 15.75 GiB total capacity; 7.09 GiB already allocated; 20.62 MiB free; 72.48 MiB cached)

@treble-maker123 , have you been able to conclusively prove that nn.Sequential is the problem ?

@ahsteven
Copy link

ahsteven commented Apr 3, 2019

I am having a similar issue. I am using the pytorch dataloader. SaysI should have over 5 Gb free but it gives 0 bytes free.

RuntimeError Traceback (most recent call last)
in
22
23 data, inputs = states_inputs
---> 24 data, inputs = Variable(data).float().to(device), Variable(inputs).float().to(device)
25 print(data.device)
26 enc_out = encoder(data)

RuntimeError: CUDA out of memory. Tried to allocate 11.00 MiB (GPU 0; 6.00 GiB total capacity; 448.58 MiB already allocated; 0 bytes free; 942.00 KiB cached)

@AlbertZhangHIT
Copy link

Hi, I also got this error.

 File "xxx", line 151, in __call__
    logits = self.model(x_hat)
  File "anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "unet.py", line 67, in forward
    x = up(x, blocks[-i-1])
  File "anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "unet.py", line 120, in forward
    out = self.conv_block(out)
  File "anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "unet.py", line 92, in forward
    out = self.block(x)
  File "anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "anaconda3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 320, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 8.00 MiB (GPU 1; 11.78 GiB total capacity; 10.66 GiB already allocated; 1.62 MiB free; 21.86 MiB cached)

@qingyu-wang
Copy link

Sadly, I met the same issue too.

RuntimeError: CUDA out of memory. Tried to allocate 1.33 GiB (GPU 1; 31.72 GiB total capacity; 5.68 GiB already allocated; 24.94 GiB free; 5.96 MiB cached)

I have trained my model in a cluster of servers and the error unpredictably happened to one of my servers. Also such wired error only happens in one of my training strategies. And the only difference is that I modify the code during data augmentation, and make the data preprocess more complicated than others. But I am not sure how to solve this problem.

@nabil2i
Copy link

nabil2i commented May 10, 2019

I am also having this issue. How to solve it??? RuntimeError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 4.00 GiB total capacity; 2.94 GiB already allocated; 10.22 MiB free; 18.77 MiB cached)

@williamluke4
Copy link

Same issue here RuntimeError: CUDA out of memory. Tried to allocate 54.00 MiB (GPU 0; 11.00 GiB total capacity; 7.89 GiB already allocated; 7.74 MiB free; 478.37 MiB cached)

@williamluke4
Copy link

@fmassa Do you have any more info on this?

@Sureshthommandru
Copy link

#16417 (comment)

The same issue to me
Dear, did you get the solution?
(base) F:\Suresh\st-gcn>python main1.py recognition -c config/st_gcn/ntu-xsub/train.yaml --device 0 --work_dir ./work_dir
C:\Users\cudalab10\Anaconda3\lib\site-packages\torch\cuda_init_.py:117: UserWarning:
Found GPU0 TITAN Xp which is of cuda capability 1.1.
PyTorch no longer supports this GPU because it is too old.

warnings.warn(old_gpu_warn % (d, name, major, capability[1]))
[05.22.19|12:02:41] Parameters:
{'base_lr': 0.1, 'ignore_weights': [], 'model': 'net.st_gcn.Model', 'eval_interval': 5, 'weight_decay': 0.0001, 'work_dir': './work_dir', 'save_interval': 10, 'model_args': {'in_channels': 3, 'dropout': 0.5, 'num_class': 60, 'edge_importance_weighting': True, 'graph_args': {'strategy': 'spatial', 'layout': 'ntu-rgb+d'}}, 'debug': False, 'pavi_log': False, 'save_result': False, 'config': 'config/st_gcn/ntu-xsub/train.yaml', 'optimizer': 'SGD', 'weights': None, 'num_epoch': 80, 'batch_size': 64, 'show_topk': [1, 5], 'test_batch_size': 64, 'step': [10, 50], 'use_gpu': True, 'phase': 'train', 'print_log': True, 'log_interval': 100, 'feeder': 'feeder.feeder.Feeder', 'start_epoch': 0, 'nesterov': True, 'device': [0], 'save_log': True, 'test_feeder_args': {'data_path': './data/NTU-RGB-D/xsub/val_data.npy', 'label_path': './data/NTU-RGB-D/xsub/val_label.pkl'}, 'train_feeder_args': {'data_path': './data/NTU-RGB-D/xsub/train_data.npy', 'debug': False, 'label_path': './data/NTU-RGB-D/xsub/train_label.pkl'}, 'num_worker': 4}

[05.22.19|12:02:41] Training epoch: 0
Traceback (most recent call last):
File "main1.py", line 31, in
p.start()
File "F:\Suresh\st-gcn\processor\processor.py", line 113, in start
self.train()
File "F:\Suresh\st-gcn\processor\recognition.py", line 91, in train
output = self.model(data)
File "C:\Users\cudalab10\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "F:\Suresh\st-gcn\net\st_gcn.py", line 82, in forward
x, _ = gcn(x, self.A * importance)
File "C:\Users\cudalab10\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "F:\Suresh\st-gcn\net\st_gcn.py", line 194, in forward
x, A = self.gcn(x, A)
File "C:\Users\cudalab10\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "F:\Suresh\st-gcn\net\utils\tgcn.py", line 60, in forward
x = self.conv(x)
File "C:\Users\cudalab10\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "C:\Users\cudalab10\Anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 320, in forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 1.37 GiB (GPU 0; 12.00 GiB total capacity; 8.28 GiB already allocated; 652.75 MiB free; 664.38 MiB cached)

@balcilar
Copy link

balcilar commented Jun 1, 2019

It is because of mini-batch of data does not fit on to GPU memory. Just decrease the batch size. When I set batch size = 256 for cifar10 dataset I got the same error; Then I set the batch size = 128, it is solved.

@EKELE-NNOROM
Copy link

Yeah @balcilar is right, I reduced the batch size and now it works

@magic282
Copy link

I have a similar issue:

RuntimeError: CUDA out of memory. Tried to allocate 11.88 MiB (GPU 4; 15.75 GiB total capacity; 10.50 GiB already allocated; 1.88 MiB free; 3.03 GiB cached)

I am using 8 V100 to train the model. The confusing part is that there is still 3.03GB cached and it cannot be allocated for 11.88MB.

@EKELE-NNOROM
Copy link

EKELE-NNOROM commented Jun 10, 2019 via email

@magic282
Copy link

I tried reducing the batch size and it worked. The confusing part is the error msg that the cached memory is larger than the to be allocated memory.

@pvk444
Copy link

pvk444 commented Jun 30, 2019

I get the same problem on a pretrained model, when I use predict. So reducing the batch size will not work.

@fmassa
Copy link
Member

fmassa commented Jun 30, 2019

If you update to the latest version of PyTorch you might have less errors like that

@AzimAhmadzadeh
Copy link

Can I ask why the numbers in the error don't add up?!
I (like all of you) get:
Tried to allocate 20.00 MiB (GPU 0; 1.95 GiB total capacity; 763.17 MiB already allocated; 6.31 MiB free; 28.83 MiB cached)
To me it means the following should be approximately true:
1.95 (GB total) - 20 (MiB needed) == 763.17 (MiB already used) + 6.31 (MiB free) + 28.83 (MiB cached)
But it is not. What is that I am getting wrong?

@tongpinmo
Copy link

I have also got the problem when i trained the U-net,the cach is enough ,but it still crash

@MSKazemi
Copy link

I have the same error...
RuntimeError: CUDA out of memory. Tried to allocate 312.00 MiB (GPU 0; 10.91 GiB total capacity; 1.07 GiB already allocated; 109.62 MiB free; 15.21 MiB cached)

@giangnguyen2412
Copy link

try reducing size (any size that will not change the result) will work.

@LELE-ELLA
Copy link

Hello everyone,

I know this issue might be closed time ago, however, I want to share with you my solution (which worked with me correctly).

I tried the suggested solutions above, but what worked for me was a simple pre-processing step before inference by resizing all the images to a similar size and lower than the original one (for me, higher dim --> 512x512).

I hope this can help you to overcome the error :)

Thank you very much! It worked for me!

@tuwonga
Copy link

tuwonga commented Sep 29, 2022

try reducing size (any size that will not change the result) will work.

how you can do that ?
thank you

@khadija23
Copy link

khadija23 commented Dec 16, 2022

The best way to solve tthis issue is to reduce your batch size,

@bghani
Copy link

bghani commented Feb 11, 2023

I realised on debugging that my memory was growing in the evaluation phase (validation) and not during training. Apparently, during the validation phase, the intermediate activations are not freed as soon as they are no longer needed, as they are during the training phase. This is because PyTorch's memory management during the forward pass is more aggressive during training, as it can reuse memory for each batch. During validation, however, the activations must be preserved until the entire forward pass is complete, so that the gradients can be computed. Here's how I solved it:

I turned off gradient computation during validation: You can set torch.set_grad_enabled(False) before running the validation loop to turn off gradient computation and reduce memory usage. So, just add that at the start of your validation routine.

Do not forget to flag it back to True at the start of training routine.

@ChenglongMa
Copy link

ChenglongMa commented Feb 15, 2023

Hi, I solved the problem using a context manager, since the volatile flag is deprecated:

with torch.no_grad():
    # Your eval code here

It works for me! @torch.no_grad() also works. Thanks so much!

@sushilkhadkaanon
Copy link

Use nvidia-smi to show any PIDs controls the GPU use kill pid to free your GPU

image

It works!! No need to reduce the batch size. Thanks @h-jia !

@AntonG-87
Copy link

Hi, I solved the problem using a context manager, since the volatile flag is deprecated:

with torch.no_grad():
    # Your eval code here

It works for me! @torch.no_grad() also works. Thanks so much!
Could you help me, in what specific place should I write this line? I would be very grateful

@ChenglongMa
Copy link

Hi, I solved the problem using a context manager, since the volatile flag is deprecated:

with torch.no_grad():
    # Your eval code here

It works for me! @torch.no_grad() also works. Thanks so much!
Could you help me, in what specific place should I write this line? I would be very grateful

Hi @AntonG-87, you can write this line in any "test" functions which do not require gradient calculations.

You can refer to the official tutorial for more details.

@pradyyadav
Copy link

I was facing the problem while training, I tried reducing the batch-size, it didn't work. But I noticed while changing my optimizer from Adam to SGD, it works.

@anzhonnian
Copy link

anzhonnian commented Nov 14, 2023

我有同样的运行时错误:

Traceback (most recent call last):
  File "carn\train.py", line 52, in <module>
    main(cfg)
  File "carn\train.py", line 48, in main
    solver.fit()
  File "C:\Users\Omar\Desktop\CARN-pytorch\carn\solver.py", line 95, in fit
    psnr = self.evaluate("dataset/Urban100", scale=cfg.scale, num_step=self.step)
  File "C:\Users\Omar\Desktop\CARN-pytorch\carn\solver.py", line 136, in evaluate
    sr = self.refiner(lr_patch, scale).data
  File "C:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\Omar\Desktop\CARN-pytorch\carn\model\carn.py", line 74, in forward
    b3 = self.b3(o2)
  File "C:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\Omar\Desktop\CARN-pytorch\carn\model\carn.py", line 30, in forward
    c3 = torch.cat([c2, b3], dim=1)
RuntimeError: CUDA out of memory. Tried to allocate 195.25 MiB (GPU 0; 4.00 GiB total capacity; 2.88 GiB already allocated; 170.14 MiB free; 2.00 MiB cached)

可以尝试运行前将CUDA_VISIBLE_DEVICES设置为1

import os
os.environ["CUDA_VISIBLE_DEVICES"] = '1'

@pablospe
Copy link

I was facing the problem while training, I tried reducing the batch-size, it didn't work. But I noticed while changing my optimizer from Adam to SGD, it works.

@pradyyadav Could you point out where in the code?

@anzhonnian
Copy link

anzhonnian commented Nov 16, 2023 via email

@pradyyadav
Copy link

I was facing the problem while training, I tried reducing the batch-size, it didn't work. But I noticed while changing my optimizer from Adam to SGD, it works.

@pradyyadav Could you point out where in the code?

I had a model training code. So just experimented with different optimizers.

@Karunesh16
Copy link

i think it might be related to less ram in your system. My ram reached its full capacity before reducing drastially and showing the same error mentioned above. i use 3060 12GB vram and 16 GB ram.

@rw200854554
Copy link

Try to restart your computer, it worked for me.

@OtakuOW
Copy link

OtakuOW commented Jan 9, 2024

Hi, I also have the same problem and I can't solve it. I can't use Animatediff because it then gives me an error. Yet I have 32 GB of DDR4 3600mhz RAM - RTX 4070 OC and Ryzen 7 5800x CPU.

@anzhonnian
Copy link

anzhonnian commented Jan 9, 2024 via email

@OtakuOW
Copy link

OtakuOW commented Jan 9, 2024

I tried to install: Anaconda e
Nvidia drivers + CUDA toolkit. To no avail

@ppriyank
Copy link

I have Quadro RTX 8000 49152MiB GPU

  torch.cudatorch.cuda..OutOfMemoryErrorOutOfMemoryError: : CUDA out of memory. Tried to allocate 1.09 GiB (GPU 0; 47.45 GiB total capacity; 1.65 GiB already allocated; 45.46 GiB free; 1.68 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONFCUDA out of memory. Tried to allocate 1.09 GiB (GPU 1; 47.45 GiB total capacity; 1.65 GiB already allocated; 45.46 GiB free; 1.68 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I tried from max_split_size_mb:32 to max_split_size_mb: 65536

export PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:128

nothing works. Any one know the fix to this?

@anzhonnian
Copy link

anzhonnian commented Jan 30, 2024 via email

@DavidT9500
Copy link

2024-02-03 20:39:53,872 - Inpaint Anything - ERROR - Allocation on device 0 would exceed allowed memory. (out of memory)
Currently allocated : 3.26 GiB
Requested : 2.64 GiB
Device limit : 8.00 GiB
Free (according to CUDA): 0 bytes
PyTorch limit (set by user-supplied memory fraction)
怎么解决的老是出现 这个问题

@JALBERTOCG
Copy link

result_gpu = torch.matmul(x_gpu, y_gpu)

RuntimeError: CUDA out of memory. Tried to allocate 2.89 GiB (GPU 1; 12.00 GiB total capacity; 8.66 GiB already allocated; 2.05 GiB free; 8.67 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See
documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

tambien tengo el mismo error alguien tiene idea de como solucionarlo

@eduardo4jesus
Copy link

I am having a similar issue, my batch size is already very small. But I want highlight a detail that many people missed in this post.

The OP says:

RuntimeError: CUDA out of memory. Tried to allocate 12.50 MiB (GPU 0; 10.92 GiB total capacity; 8.57 MiB already allocated; 9.28 GiB free; 4.68 MiB cached)

12.50Mib attempted <<< 9.28 GiB free

Many people that commented "I have the same problem" do not have that same exact issue.

Also, what about this other part of the message:

If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

has it been addressed as part of any solution? If not, does anyone know what is the point of that part of the message?

@pablospe
Copy link

pablospe commented Feb 9, 2024

In my case, this was the solution to the memory issue:
#16417 (comment)

@WangFengtu1996
Copy link

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 22.00 MiB. GPU 0 has a total capacity of 23.65 GiB of which 11.06 MiB is free. Including non-PyTorch memory, this process has 23.60 GiB memory in use. Of the allocated memory 22.90 GiB is allocated by PyTorch, and 194.87 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

image

@anzhonnian
Copy link

anzhonnian commented Mar 13, 2024 via email

@ravi-aii
Copy link

In my case I just lowered the batch size number from 8 to 4. It worked and the error of "cuda out of memory" was solved.

@anzhonnian
Copy link

anzhonnian commented Apr 15, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user
Projects
None yet
Development

No branches or pull requests