New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BART training time #1525
Labels
Comments
The time can depend on the type and numbers of gpus. We trained for around 11-12 days on 256 gpus. |
facebook-github-bot
pushed a commit
that referenced
this issue
Dec 28, 2020
Summary: Before: ``` 2020-12-23 11:46:16 | INFO | fairseq_cli.eval_lm | num. model params: 353781760 2020-12-23 11:46:21 | INFO | fairseq.data.data_utils | loaded 89663978 examples from: /private/home/sshleifer/data-bin/new_hybrid_data/train ``` After: ``` 2020-12-23 11:46:16 | INFO | fairseq_cli.eval_lm | num. model params: 353,781,760 2020-12-23 11:46:21 | INFO | fairseq.data.data_utils | loaded 89,663,978 examples from: /private/home/sshleifer/data-bin/new_hybrid_data/train ``` Pull Request resolved: fairinternal/fairseq-py#1525 Test Plan: Run `fairseq-eval-lm` or `fairseq-train` and look at logs. For example, ``` export dd2=/private/home/sshleifer/data-bin/new_hybrid_data export m=/private/home/myleott/models/public_models/LM/roberta_lm.me_fp16.bm_none.tps1024.transformer_lm_gpt2_small.share.adam.b2_0.98.eps1e-08.cl0.0.lr0.003.wu3000.dr0.1.atdr0.1.wd0.01.ms2.uf4.mu100000.s1.ngpu64/model.pt fairseq-eval-lm $dd2 \ --path $m \ --sample-break-mode complete --gen-subset train \ --tokens-per-sample 3072 --max-tokens 3072 --context-window 2560 --softmax-batch 1024 --fp16 ``` Reviewed By: myleott Differential Revision: D25693004 Pulled By: sshleifer fbshipit-source-id: bfeb93fc6607cca2cb7a6e820f51e174d02d1f62
harkash
pushed a commit
to harkash/fairseq
that referenced
this issue
Feb 23, 2021
…rch#1525) Summary: Before: ``` 2020-12-23 11:46:16 | INFO | fairseq_cli.eval_lm | num. model params: 353781760 2020-12-23 11:46:21 | INFO | fairseq.data.data_utils | loaded 89663978 examples from: /private/home/sshleifer/data-bin/new_hybrid_data/train ``` After: ``` 2020-12-23 11:46:16 | INFO | fairseq_cli.eval_lm | num. model params: 353,781,760 2020-12-23 11:46:21 | INFO | fairseq.data.data_utils | loaded 89,663,978 examples from: /private/home/sshleifer/data-bin/new_hybrid_data/train ``` Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1525 Test Plan: Run `fairseq-eval-lm` or `fairseq-train` and look at logs. For example, ``` export dd2=/private/home/sshleifer/data-bin/new_hybrid_data export m=/private/home/myleott/models/public_models/LM/roberta_lm.me_fp16.bm_none.tps1024.transformer_lm_gpt2_small.share.adam.b2_0.98.eps1e-08.cl0.0.lr0.003.wu3000.dr0.1.atdr0.1.wd0.01.ms2.uf4.mu100000.s1.ngpu64/model.pt fairseq-eval-lm $dd2 \ --path $m \ --sample-break-mode complete --gen-subset train \ --tokens-per-sample 3072 --max-tokens 3072 --context-window 2560 --softmax-batch 1024 --fp16 ``` Reviewed By: myleott Differential Revision: D25693004 Pulled By: sshleifer fbshipit-source-id: bfeb93fc6607cca2cb7a6e820f51e174d02d1f62
sshleifer
added a commit
that referenced
this issue
Apr 7, 2021
Summary: Before: ``` 2020-12-23 11:46:16 | INFO | fairseq_cli.eval_lm | num. model params: 353781760 2020-12-23 11:46:21 | INFO | fairseq.data.data_utils | loaded 89663978 examples from: /private/home/sshleifer/data-bin/new_hybrid_data/train ``` After: ``` 2020-12-23 11:46:16 | INFO | fairseq_cli.eval_lm | num. model params: 353,781,760 2020-12-23 11:46:21 | INFO | fairseq.data.data_utils | loaded 89,663,978 examples from: /private/home/sshleifer/data-bin/new_hybrid_data/train ``` Pull Request resolved: fairinternal/fairseq-py#1525 Test Plan: Run `fairseq-eval-lm` or `fairseq-train` and look at logs. For example, ``` export dd2=/private/home/sshleifer/data-bin/new_hybrid_data export m=/private/home/myleott/models/public_models/LM/roberta_lm.me_fp16.bm_none.tps1024.transformer_lm_gpt2_small.share.adam.b2_0.98.eps1e-08.cl0.0.lr0.003.wu3000.dr0.1.atdr0.1.wd0.01.ms2.uf4.mu100000.s1.ngpu64/model.pt fairseq-eval-lm $dd2 \ --path $m \ --sample-break-mode complete --gen-subset train \ --tokens-per-sample 3072 --max-tokens 3072 --context-window 2560 --softmax-batch 1024 --fp16 ``` Reviewed By: myleott Differential Revision: D25693004 Pulled By: sshleifer fbshipit-source-id: bfeb93fc6607cca2cb7a6e820f51e174d02d1f62
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
May I know how much time BART pre-training took in which GPU configuration? I can see in the paper its written 500K steps with batch size 8k but I want to know the time it took. Many thanks.
The text was updated successfully, but these errors were encountered: