Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU memory usage when doing beam search #314

Open
tanyuqian opened this issue Jul 6, 2020 · 0 comments
Open

GPU memory usage when doing beam search #314

tanyuqian opened this issue Jul 6, 2020 · 0 comments
Labels
question Further information is requested topic: modules Issue about built-in Texar modules

Comments

@tanyuqian
Copy link
Member

A BART model (https://arxiv.org/pdf/1910.13461.pdf) is implemented here: https://github.com/tanyuqian/texar-pytorch/tree/master/examples/bart

It has passed the test of text classification (MNLI) and summarization (CNN/DM) with greedy decoding, but it fails to run CNN/DM with beam search on a single GTX 1080Ti because of GPU memory, even when batch_size=1, beam_width=2, max_decoding_length=140.

A script to show this issue is here: https://github.com/tanyuqian/texar-pytorch/blob/master/examples/bart/bart_cnn.py (run this code after downloading CNN/DM data following README)

Note that in this fork, two more hyperparameters are added in TransformerDecoder ('normalize_before' and 'final_layer_norm'): https://github.com/tanyuqian/texar-pytorch/blob/master/texar/torch/modules/decoders/transformer_decoders.py#L290

@gpengzhi gpengzhi added question Further information is requested topic: modules Issue about built-in Texar modules labels Oct 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested topic: modules Issue about built-in Texar modules
Projects
None yet
Development

No branches or pull requests

2 participants