We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1、训练脚本中有个公式GLOBAL_BATCH_SIZE=$((($WORLD_SIZE * $MICRO_BATCH_SIZE) / ($TP_SIZE * $PP_SIZE) * 8)) 在这个公式中,最后乘以8是代表什么意思?
2、还有另外如果不使用脚本中的GLOBAL_BATCH_SIZE计算公式,将GLOBAL_BATCH_SIZE赋固定的值,这个值设置大或设置小有什么影响吗?在设置值这方面有什么讲究吗?
The text was updated successfully, but these errors were encountered:
1、公式中的8是梯度累计值,你也可以调整成≥1的任意整数值。 2、GLOBAL_BATCH_SIZE的计算公式,本质上是拿【数据并行组大小 DP_SIZE】x【MICRO_BATCH_SIZE】x【梯度累计值】。如果你自行给GLOBAL_BATCH_SIZE赋值,那么依据上式反推出来的【梯度累计值】得是一个≥1的任意整数值;否则会报错
Sorry, something went wrong.
No branches or pull requests
1、训练脚本中有个公式GLOBAL_BATCH_SIZE=$((($WORLD_SIZE * $MICRO_BATCH_SIZE) / ($TP_SIZE * $PP_SIZE) * 8))
在这个公式中,最后乘以8是代表什么意思?
2、还有另外如果不使用脚本中的GLOBAL_BATCH_SIZE计算公式,将GLOBAL_BATCH_SIZE赋固定的值,这个值设置大或设置小有什么影响吗?在设置值这方面有什么讲究吗?
The text was updated successfully, but these errors were encountered: