Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metaflow.exception.MetaflowNotFound while the Flow exists on the S3 server (on-prem) #313

Open
gaborvecsei opened this issue Apr 28, 2022 · 0 comments

Comments

@gaborvecsei
Copy link

gaborvecsei commented Apr 28, 2022

Problem

I have deployed a metaflow service with the dev docker-compose and I extended the the environment variables with the ones which is needed to configure metaflow. As I saw that is used to to retrieve artefacts for the UI for the different runs.
Then it seems like metaflow can't access the files on the S3 storage (metaflow.exception.MetaflowNotFound error).

Details

(If I don't include the METAFLOW_... envs then I receive and "AWS credential error ..." error which seems valid, as I have my own S3 endpoint.)

These variables are new compared to the existing setup:

      - AWS_ACCESS_KEY_ID=<ID>
      - AWS_SECRET_ACCESS_KEY=<SECRET>
      - METAFLOW_DEFAULT_METADATA="service"
      - METAFLOW_DEFAULT_DATASTORE="s3"
      - METAFLOW_DATASTORE_SYSROOT_S3="s3://testbucket/metaflow-testbucket"
      - METAFLOW_S3_ENDPOINT_URL="http://192.168.99.99"
      - METAFLOW_S3_VERIFY_CERTIFICATE=false

Now, when I am running with these, the error is the following

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/services/ui_backend_service/data/cache/client/cache_server.py", line 302, in <module>
    cli(auto_envvar_prefix='MFCACHE')
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/root/services/ui_backend_service/data/cache/client/cache_server.py", line 298, in cli
    Scheduler(store, max_actions).loop()
  File "/root/services/ui_backend_service/data/cache/client/cache_server.py", line 199, in __init__
    maxtasksperchild=512,  # Recycle each worker once 512 tasks have been completed
  File "/usr/local/lib/python3.7/multiprocessing/context.py", line 119, in Pool
    context=self.get_context())
  File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 176, in __init__
    self._repopulate_pool()
  File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 241, in _repopulate_pool
    w.start()
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 112, in start
    self._popen = self._Popen(self)
  File "/usr/local/lib/python3.7/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/usr/local/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/usr/local/lib/python3.7/multiprocessing/popen_fork.py", line 74, in _launch
    code = process_obj._bootstrap()
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/root/services/ui_backend_service/data/cache/client/cache_worker.py", line 29, in execute_action
    execute(tempdir, action_cls, request)
  File "/root/services/ui_backend_service/data/cache/client/cache_worker.py", line 56, in execute
    invalidate_cache=req.get('invalidate_cache', False))
  File "/root/services/ui_backend_service/data/cache/get_log_file_action.py", line 140, in execute
    results = {**existing_keys}
  File "/usr/local/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/root/services/ui_backend_service/data/cache/utils.py", line 130, in streamed_errors
    get_traceback_str()
  File "/root/services/ui_backend_service/data/cache/utils.py", line 124, in streamed_errors
    yield
  File "/root/services/ui_backend_service/data/cache/get_log_file_action.py", line 131, in execute
    task = Task(pathspec, attempt=attempt)
  File "/usr/local/lib/python3.7/site-packages/metaflow/client/core.py", line 947, in __init__
    super(Task, self).__init__(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/metaflow/client/core.py", line 361, in __init__
    self._object = self._get_object(*ids)
  File "/usr/local/lib/python3.7/site-packages/metaflow/client/core.py", line 391, in _get_object
    raise MetaflowNotFound("%s does not exist" % self)

metaflow.exception.MetaflowNotFound: Task('HelloFlow/5/start/12', attempt=0) does not exist

If I check my S3 with s3cmd I can see that a dir exists with this path.

When I am running the flow the files are stored perfectly, I did not notice any problems, and also I can see it on the UI.

(I understand that Metaflow is not primary created for on-prem usage, but it would be a blast to use it without AWS. I would be grateful for a on-prem setup guide)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant