Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问多ps 多euler的情况下有infer的例子可以借鉴吗 #329

Open
zhangyuhanjc opened this issue Apr 13, 2021 · 0 comments
Open

Comments

@zhangyuhanjc
Copy link

zhangyuhanjc commented Apr 13, 2021

目前处于分布式训练保存ckpt,可以读取 可以再训练的状态, 当user_id数量极多的情况下模型参数非常大 应该是多个ps来扛,这个时候想要导出user_embedding 不知有没有例子可以参考 ,我尝试在examples/graphsage/run_graphsage.py的基础上进行了 分布式train的修改是ok的, 不过在此基础上直接调用model_estimator.infer() 好像并不行,wo

进行infer时 代码、执行语句、日志分别是

代码主要部分
`tf_config={
'cluster': {'chief': chief_hosts, 'worker': worker_hosts, 'ps': ps_hosts},
'task': {'type': job_name, 'index': task_index}
}
if job_name == 'worker' and task_index == 0:
tf_config['task'] = {"index": 0, "type": "chief"}
....
....
model = graphSage的例子
config = tf.estimator.RunConfig(log_step_count_steps=None)
model_estimator = NodeEstimator(model, params, config)

if flags_obj.run_mode == 'train':
model_estimator.train_and_evaluate()
elif flags_obj.run_mode == 'evaluate':
model_estimator.evaluate()
elif flags_obj.run_mode == 'infer':
model_estimator.infer()
else:
raise ValueError('Run mode not exist!')
`
执行语句
python run_graphsage_distribute_new.py --job_name 'start_euler' --shard_idx 0 --shard_num ${shard_num} --data_dir ${data_dir} --zk_addr ${zk_addr} --zk_path ${zk_path}
python run_graphsage_distribute_new.py --job_name 'start_euler' --shard_idx 1 --shard_num ${shard_num} --data_dir ${data_dir} --zk_addr ${zk_addr} --zk_path ${zk_path}

python run_graphsage_distribute_new.py --job_name 'ps' --shard_num ${shard_num} --task_index 0 --ps_hosts ${ps_hosts} --worker_hosts ${worker_hosts} --chief_hosts ${chief_hosts} --zk_addr ${zk_addr} --zk_path ${zk_path} # ps
刚启动到这个ps 时就已经有问题了(目前仅1个ps节点)

日志图片
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant