Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FATE后端使用spark_rabbitmq无法运行示例任务的问题 #236

Open
wsptfb opened this issue May 7, 2022 · 1 comment
Open

FATE后端使用spark_rabbitmq无法运行示例任务的问题 #236

wsptfb opened this issue May 7, 2022 · 1 comment

Comments

@wsptfb
Copy link

wsptfb commented May 7, 2022

在KubeFATE的v1.8.0版本的docker-deploy部署方法中,使用两台虚拟机,使用eggroll可以正常运行示例流程;使用spark和rabbitmq无法运行示例数据。

为了使用spark和rabbitmq,根据收集的信息,

  1. 首先修改parties.conf文件中的backend=spark_rabbitmq
    image

  2. 然后修改training_template/public/fate_flow/conf/service_conf.yaml文件中的default_engines部分,
    分别设置为 computing: spark federation: rabbitmq storage: hdfs
    1651914151(1)

  3. 修改完成后,分别运行generate_config.sh和docker_deploy.sh all命令,在两台虚拟机上启动了所有docker容器。在host端进入client_1容器,修改fateflow/examples/upload/upload_host.json文件,在最后添加”storage_engine“: "HDFS"后,使用flow data upload 提交数据
    image

  4. 在guest端以同样的方式修改upload_guest.json文件,添加”storage_engine“: "HDFS"后,使用flow data upload 提交数据。然后修改fateflow/examples/lr/test_hetero_lr_job_conf.json文件,在job_parameters中提交”spark_run“和"rabbitmq_run"的配置信息
    image

  5. 使用命令flow job submit提交任务
    image

整个流程没有报错信息,但是提交任务后,所有的f_status一直处于waiting状态,训练流程无法运行。请问有可能是什么问题导致上述任务无法运行的情况???

@zhihuiwan
Copy link
Contributor

job日志下面有没有报错信息?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants