Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

话说现在用TensorFlow分布式训练CTR模型怎么搞速度快啊? #18

Open
guotong1988 opened this issue Mar 17, 2021 · 2 comments

Comments

@guotong1988
Copy link

Parameter Server架构还是All Reduce架构?
CPU还是GPU?
有没有开源代码参考?
用不用改TensorFlow源码?
性价比最高的方案是?

@qiaoguan
Copy link
Owner

qiaoguan commented Jun 3, 2021

tf estimator 也直接支持分布式的,是一种最快的实验方式吧, 另外对于ctr模型,由于tfrecord解析数据速度的限制,用gpu可能会发现利用率上不去,把tfrecord 用一些解析数据很快的方式重写,用gpu训练,batchsize调大,训练速度也能提升好多倍

@simonshiwt
Copy link

tf estimator 也直接支持分布式的,是一种最快的实验方式吧, 另外对于ctr模型,由于tfrecord解析数据速度的限制,用gpu可能会发现利用率上不去,把tfrecord 用一些解析数据很快的方式重写,用gpu训练,batchsize调大,训练速度也能提升好多倍

大佬好,我也发现我的模型训练有点慢,而且gpu利用率极低,"tfrecord 用一些解析数据很快的方式重写"具体指哪些方法能够列举一下吗?感谢!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants