Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Memory Benchmark #30

Open
pabloppp opened this issue Feb 10, 2020 · 5 comments
Open

GPU Memory Benchmark #30

pabloppp opened this issue Feb 10, 2020 · 5 comments
Labels
documentation Improvements or additions to documentation

Comments

@pabloppp
Copy link
Contributor

I did a few training runs of a simple Reformer module with different parameters and logged the GPU memory usage.

Of course, depending on your machine or other things these values can vary, but I thought it might be useful as a visual guide:

dim = 512, seq_len = 256, depth = 1, heads = 1, batch_size = 1: 452 MB

dim = 512, seq_len = 256, depth = 1, heads = 1, batch_size = 8: 992 MB

dim = 512, seq_len = 256, depth = 1, heads = 1, batch_size = 16: 1584 MB

dim = 512, seq_len = 256, depth = 1, heads = 1, batch_size = 32: 2866 MB

dim = 512, seq_len = 256, depth = 1, heads = 1, batch_size = 64: 4606 MB

dim = 512, seq_len = 256, depth = 1, heads = 1, batch_size = 128: 9788 MB


dim = 512, seq_len = 512, depth = 1, heads = 1, batch_size = 1: 538 MB

dim = 512, seq_len = 512, depth = 1, heads = 1, batch_size = 8: 1580 MB

dim = 512, seq_len = 512, depth = 1, heads = 1, batch_size = 16: 2870 MB

dim = 512, seq_len = 512, depth = 1, heads = 1, batch_size = 32: 4582 MB

dim = 512, seq_len = 512, depth = 1, heads = 1, batch_size = 64: 9276 MB


dim = 512,seq_len = 1024, depth = 1, heads = 1, batch_size = 1: 682 MB

dim = 512, seq_len = 1024, depth = 1, heads = 1, batch_size = 8: 2904 MB

dim = 512, seq_len = 1024, depth = 1, heads = 1, batch_size = 16: 4634 MB

dim = 512, seq_len = 1024, depth = 1, heads = 1, batch_size = 32: 9310 MB


dim = 512, seq_len = 2048, depth = 1, heads = 1, batch_size = 1: 992 MB

dim = 512, seq_len = 2048, depth = 1, heads = 1, batch_size = 8: 4644 MB

dim = 512, seq_len = 2048, depth = 1, heads = 1, batch_size = 16: 9256 MB


dim = 512, seq_len = 4096, depth = 1, heads = 1, batch_size = 1: 1602 MB

dim = 512, seq_len = 4096, depth = 1, heads = 1, batch_size = 8: 8810 MB

dim = 512, seq_len = 4096, depth = 1, heads = 1, batch_size = 10: 10976 MB


dim = 512, seq_len = 8192, depth = 1, heads = 1, batch_size = 1: 2884 MB

dim = 512, seq_len = 8192, depth = 1, heads = 1, batch_size = 5: 11396 MB


dim = 512, seq_len = 256, depth = 1, heads = 1, batch_size = 8: 992 MB

dim = 512, seq_len = 256, depth = 2, heads = 1, batch_size = 8: 1054 MB

dim = 512, seq_len = 256, depth = 4, heads = 1, batch_size = 8: 1142 MB

dim = 512, seq_len = 256, depth = 6, heads = 1, batch_size = 8: 1220 MB

dim = 512, seq_len = 256, depth = 12, heads = 1, batch_size = 8: 1512 MB

dim = 512, seq_len = 256, depth = 24, heads = 1, batch_size = 8: 2056 MB

dim = 512, seq_len = 256, depth = 24, heads = 1, batch_size = 16: 2680 MB


dim = 128, seq_len = 256, depth = 12, heads = 1, batch_size = 8: 566 MB

dim = 128, seq_len = 256, depth = 12, heads = 2, batch_size = 8: 576 MB

dim = 128, seq_len = 256, depth = 12, heads = 4, batch_size = 8: 616 MB

dim = 128, seq_len = 256, depth = 12, heads = 8, batch_size = 8: 732 MB

dim = 128, seq_len = 256, depth = 12, heads = 16, batch_size = 8: 1000 MB


dim = 32, seq_len = 256, depth = 12, heads = 8, batch_size = 8: 644 MB

dim = 64, seq_len = 256, depth = 12, heads = 8, batch_size = 8: 670 MB

dim = 128, seq_len = 256, depth = 12, heads = 8, batch_size = 8: 732 MB

dim = 256, seq_len = 256, depth = 12, heads = 8, batch_size = 8: 918 MB

dim = 512, seq_len = 256, depth = 12, heads = 8, batch_size = 8: 1516 MB

dim = 1024, seq_len = 256, depth = 12, heads = 8, batch_size = 8: 3552 MB


dim = 512, seq_len = 4096, depth = 6, heads = 8, batch_size = 8: 9672 MB

dim = 128, seq_len = 4096, depth = 12, heads = 8, batch_size = 8: 6270 MB

dim = 512, seq_len = 8192, depth = 12, heads = 8, batch_size = 1: 3628 MB

dim = 512, seq_len = 8192, depth = 12, heads = 8, batch_size = 4: 10048 MB

dim = 128, seq_len = 1024, depth = 6, heads = 4, batch_size = 32: 4608 MB

dim = 128, seq_len = 1024, depth = 6, heads = 4, batch_size = 64: 8052 MB

dim = 128, seq_len = 1024, depth = 6, heads = 4, batch_size = 80: 9990 MB

@pabloppp pabloppp changed the title [NOT AN ISSUE] Memory Benchmark [NOT AN ISSUE] GPU Memory Benchmark Feb 10, 2020
@lucidrains
Copy link
Owner

lucidrains commented Feb 10, 2020

You should compare it to full attention! (Just set the use_full_attn flag to True)

@pabloppp
Copy link
Contributor Author

Not sure how accurate these results are, but when I plot the memory usage with respect to the sequence length of a model with this setup dim = 128, depth = 6, heads = 4, batch_size = 4 I get this:

image

Note that for shorter sequences, the transformer model (use_full_attn flag to True) seems to be slightly less memory intensive, but starting with sequence length of 4096 the Reformer seems to work much better, while the transformer's memory soars and I run out of memory with a sequence length of 8192 (I have 11GB of memory), with the reformer I am able to get to up to a sequence length of 16384 without even filling the memory (Only uses 8GB) so we could probably get even higher.

@lucidrains
Copy link
Owner

@pabloppp yup, the actual Transformer would have probably ceased to work at around 2048, because the reversibility is still in play even if you turn on full attention. other hyperparameters to play around with is ff_chunks, which you can increase for further CPU / memory trade offs

@avacaondata
Copy link

I'm gonna add here the trials I do myself with the corresponding memory usage:

dim = 1024, seq_len=8960, depth=12, heads=16, batch_size=1: 8501MiB

@lucidrains lucidrains changed the title [NOT AN ISSUE] GPU Memory Benchmark GPU Memory Benchmark Mar 6, 2020
@lucidrains lucidrains added the documentation Improvements or additions to documentation label Mar 6, 2020
@jaideep11061982
Copy link

hi lucidrains.
How i pass parameter that says i want these many encoder and these many decoders.
Say 2 enc,2 decoder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

4 participants