Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird behavior with memory usage within samply.py #10

Open
GrayWasTaken opened this issue Sep 1, 2022 · 1 comment
Open

Weird behavior with memory usage within samply.py #10

GrayWasTaken opened this issue Sep 1, 2022 · 1 comment

Comments

@GrayWasTaken
Copy link

So for some reason progen2-base starts to use an ungodly amount of VRAM the more I increase the value for --num-samples. If I set the value of --num-samples to 50 I get the following error. Yet if I set --num-samples to 30, 40, even 45, no issue occurs. I assume this is unintentional.

sampling
sampling took 36.29s
Traceback (most recent call last):
  File "sample.py", line 207, in <module>
    main()
  File "sample.py", line 193, in main
    completions = sample(device=device, model=model, tokenizer=tokenizer, context=args.context, pad_token_id=tokenizer.encode('<|pad|>').ids[0], num_return_sequences=args.num_samples, temp=args.t, top_p=args.p, max_length=args.max_length)
  File "sample.py", line 73, in sample
    tokens_batch = model.generate(input_ids, do_sample=True, temperature=temp, max_length=max_length, top_p=top_p, num_return_sequences=num_return_sequences, pad_token_id=pad_token_id)
  File "/home/.../.../progen/progen2/.venv/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/.../.../progen/progen2/.venv/lib/python3.7/site-packages/transformers/generation_utils.py", line 1210, in generate
    **model_kwargs,
  File "/home/.../.../progen/progen2/.venv/lib/python3.7/site-packages/transformers/generation_utils.py", line 1714, in sample
    output_hidden_states=output_hidden_states,
  File "/home/.../.../progen/progen2/.venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/.../.../progen/progen2/models/progen/modeling_progen.py", line 640, in forward
    return_dict=return_dict,
  File "/home/.../.../progen/progen2/.venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/.../.../progen/progen2/models/progen/modeling_progen.py", line 507, in forward
    output_attentions=output_attentions,
  File "/home/.../.../progen/progen2/.venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/.../.../progen/progen2/models/progen/modeling_progen.py", line 269, in forward
    output_attentions=output_attentions,
  File "/home/.../.../progen/progen2/.venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/.../.../progen/progen2/models/progen/modeling_progen.py", line 203, in forward
    value = torch.cat((past_value, value), dim=-2)
RuntimeError: CUDA out of memory. Tried to allocate 76.00 MiB (GPU 0; 14.76 GiB total capacity; 13.21 GiB already allocated; 37.75 MiB free; 13.82 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

For reference this is the command I'm running

python sample.py --model progen2-base --t 0.8 --p 90 --max-length 512 --num-samples 40 --context <232 AA sequence>
@ullahsamee
Copy link

ullahsamee commented Feb 18, 2023

Hi!

I am getting the exact same error "RuntimeError: CUDA out of memory" and I am using largemodel.
I tried everything but nothing is working to solve this annoying issue.
https://discuss.pytorch.org/search?q=cuda%20out%20of%20memory

I also cleared cache and others from here https://medium.com/@snk.nitin/how-to-solve-cuda-out-of-memory-error-850bb247cfb2
but its not working.

with torch.no_grad()
pytorch/pytorch#16417
is present in line 1339 /modules but still not working.

I think If Dr. Madani could reduce the batch size etc something to solve this error "RuntimeError: CUDA out of memory".

I assume an upgrade to PyTorch 2.0 in the requirements.txt could solve this issue although I did not tried yet.

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants