New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OLMoThreadError #552
Comments
@juripapay, can you give more details on the size of the model, batch size, GPU(AMD/Nvidia), and flash attention use? I wanted to know more regarding in which setting are you getting a throughout of 9k tokens/GPU/sec. |
@juripapay - is there a traceback logged after the last line you pasted? |
Hi i encountered the same problem, would need some assistance on how to resolve I tried training on the OLMo1b model. global_train_batch_size: 2048 Traceback (most recent call last): |
❓ The question
Please advise where this error might come from:
[2024-04-18 19:06:17] INFO [olmo.train:816, rank=0] [step=75/739328]
train/CrossEntropyLoss=7.417
train/Perplexity=1,664
throughput/total_tokens=314,572,800
throughput/device/tokens_per_second=9,407
throughput/device/batches_per_second=0.0022
[2024-04-18 19:10:41] CRITICAL [olmo.util:158, rank=0] Uncaught OLMoThreadError: generator thread data thread 3 failed
The text was updated successfully, but these errors were encountered: