Hardware for 100b training? #13

zxcvqwerasdf · 2022-08-28T17:49:14Z

There is any info what hardware do you need to train 100b model? Thx.

conceptofmind · 2023-01-12T19:03:57Z

@zxcvqwerasdf GLM-130B was pre-trained over 400 billion tokens on a cluster of 96 NVIDIA DGX-A100 (8×40G)
GPU nodes between May 6 and July 3, 2022. The number of tokens is important. I believe it cost around $5 million to achieve the accuracy and results that it did. The model is Enc-Dec though.

jgcb00 · 2023-03-10T13:42:26Z

"When training a 65B-parameter model, our code processes around 380 tokens/sec/GPU on 2048 A100 GPU with 80GB of RAM. This means that training over our dataset containing 1.4T tokens takes approximately 21 days."
They used 2048 A100 80GB gpus for 21 days. $2.2M

source : LLaMA

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hardware for 100b training? #13

Hardware for 100b training? #13

zxcvqwerasdf commented Aug 28, 2022

conceptofmind commented Jan 12, 2023 •

edited

jgcb00 commented Mar 10, 2023

Hardware for 100b training? #13

Hardware for 100b training? #13

Comments

zxcvqwerasdf commented Aug 28, 2022

conceptofmind commented Jan 12, 2023 • edited

jgcb00 commented Mar 10, 2023

conceptofmind commented Jan 12, 2023 •

edited