Skip to content
This repository has been archived by the owner on May 21, 2024. It is now read-only.

prototype python_pachyderm.run_like_a_pipeline #320

Open
albscui opened this issue Aug 9, 2021 · 2 comments
Open

prototype python_pachyderm.run_like_a_pipeline #320

albscui opened this issue Aug 9, 2021 · 2 comments

Comments

@albscui
Copy link

albscui commented Aug 9, 2021

An example usage:

--- cell --
def pl_body():
  open("/pfs/big", "r") as f:
  # do stuff with f

run_like_a_pipeline(
  datums=["big:/", "config:/cfgfile1.txt"],
  code=pl_body)
--- output ---
output is in '/data/a1b2c3'
---cell ---
matplotlib.plot("/data/a1b2c3")
--- output ---
<graph>
---
@msteffen
Copy link
Contributor

msteffen commented Aug 25, 2021

Our next goal for this prototype is to get <User> to use this for debugging failed datums; they specifically mentioned debugging failed datums as a sticking point that they're struggling with, and hopefully this will significantly reduce their iteration time when doing it

@msteffen
Copy link
Contributor

msteffen commented Sep 23, 2021

Following up with our conversation on this yesterday:

  1. run_like_a_pipeline should, at the minimum, allow you to specify a (pipeline, datum), download the files in that datum, and mount them into a container running locally (also in that local container: /pfs/out should be a bind-mounted tmp dir where you can see the output from processing that datum)
  2. If users specify code=pl_body, then the container in (1) is some bog-standard python container, and the command becomes, essentially python -c <function body>, a la Kubeflow function-based components. Otherwise, we could use the pipeline image, or maybe allow users to specify their own python image and run their code in that.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants