Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Docker to run in standalone mode, data upload failed. #457

Open
danerlt opened this issue Aug 21, 2023 · 2 comments
Open

Using Docker to run in standalone mode, data upload failed. #457

danerlt opened this issue Aug 21, 2023 · 2 comments

Comments

@danerlt
Copy link

danerlt commented Aug 21, 2023

System information

  • Have I written custom code (yes/no): no
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): centos7
  • FATE Flow version (use command: python fate_flow_server.py --version): {'FATE': '1.11.2', 'FATEFlow': '1.11.1', 'FATEBoard': '1.11.1', 'EGGROLL': '2.5.1', 'CENTOS': '7.2', 'UBUNTU': '16.04', 'PYTHON': '3.8', 'MAVEN': '3.6.3', 'JDK': '8', 'SPARK': '3.4.0'}
  • Python version (use command: python --version): Python 3.8.13

Describe the current behavior

The container star command is as follows.

$ docker run -d -it \
    --name single_fate \
    --restart=always \
    -p 8080:8080 \
    -p 9380:9380 \
    federatedai/standalone_fate:1.11.2
$ docker ps |grep single_fate
cc2b7babbc26        federatedai/standalone_fate:1.11.2              "./bin/docker-entryp…"   22 minutes ago      Up 22 minutes          0.0.0.0:9380->9380/tcp, 0.0.0.0:9090->8080/tcp                                                       single_fate

I followed the tutorial and ran the code for Pipeline tutorial upload, but upload data failed on this code,

pipeline_upload.upload(drop=1)

error info is :

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/miniconda3/envs/fate/lib/python3.8/site-packages/pipeline/utils/invoker/job_submitter.py:61, in JobInvoker.upload_data(self, submit_conf, drop)
     60 if 'retcode' not in result or result["retcode"] != 0:
---> 61     raise ValueError
     63 if "jobId" not in result:

ValueError: 

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[17], line 1
----> 1 pipeline_upload.upload(drop=1)

File ~/miniconda3/envs/fate/lib/python3.8/site-packages/loguru/_logger.py:1251, in Logger.catch.<locals>.Catcher.__call__.<locals>.catch_wrapper(*args, **kwargs)
   1249 def catch_wrapper(*args, **kwargs):
   1250     with catcher:
-> 1251         return function(*args, **kwargs)
   1252     return default

File ~/miniconda3/envs/fate/lib/python3.8/site-packages/pipeline/backend/pipeline.py:664, in PipeLine.upload(self, drop)
    662 upload_conf = self._construct_upload_conf(data_conf)
    663 LOGGER.debug(f"upload_conf is {json.dumps(upload_conf)}")
--> 664 self._train_job_id, detail_info = self._job_invoker.upload_data(upload_conf, int(drop))
    665 self._train_board_url = detail_info["board_url"]
    666 self._job_invoker.monitor_job_status(self._train_job_id,
    667                                      "local",
    668                                      0)

File ~/miniconda3/envs/fate/lib/python3.8/site-packages/pipeline/utils/invoker/job_submitter.py:69, in JobInvoker.upload_data(self, submit_conf, drop)
     67     data = result["data"]
     68 except BaseException:
---> 69     raise ValueError("job submit failed, err msg: {}".format(result))
     70 return job_id, data

Describe the expected behavior

upload data success.

Contributing

  • Do you want to contribute a PR? (yes/no): no
  • Briefly describe your candidate solution(if contributing):

When I use the curl command inside a container.:

curl http://127.0.0.1:9380/
{"retcode":100,"retmsg":"<NotFound '404: Not Found'>"}

When I use the curl command outside of the container.:

$ curl http://127.0.0.1:9380/
curl: (56) Recv failure: Connection reset by peer

So I guess it's a problem with the default host.

@danerlt
Copy link
Author

danerlt commented Aug 21, 2023

After I changed the host in service_conf.yaml of fateflow host to 0.0.0.0, the data was uploaded successfully.

The modified configuration is as follows::

fateflow:
  # you must set real ip address, 127.0.0.1 and 0.0.0.0 is not supported
  host: 0.0.0.0
  http_port: 9380
  grpc_port: 9360
  # when you have multiple fateflow server on one party,
  # we suggest using nginx for load balancing.
  nginx:
    host:
    http_port:
    grpc_port:
  # use random instance_id instead of {host}:{http_port}
  random_instance_id: false

@zhihuiwan
Copy link
Contributor

Thank you very much for your feedback. You are correct, and we will work on optimizing this issue in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants