You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@zxcvqwerasdf GLM-130B was pre-trained over 400 billion tokens on a cluster of 96 NVIDIA DGX-A100 (8×40G)
GPU nodes between May 6 and July 3, 2022. The number of tokens is important. I believe it cost around $5 million to achieve the accuracy and results that it did. The model is Enc-Dec though.
"When training a 65B-parameter model, our code processes around 380 tokens/sec/GPU on 2048 A100 GPU with 80GB of RAM. This means that training over our dataset containing 1.4T tokens takes approximately 21 days."
They used 2048 A100 80GB gpus for 21 days. $2.2M
There is any info what hardware do you need to train 100b model? Thx.
The text was updated successfully, but these errors were encountered: