prototype `python_pachyderm.run_like_a_pipeline` #320

albscui · 2021-08-09T22:34:48Z

An example usage:

--- cell --
def pl_body():
  open("/pfs/big", "r") as f:
  # do stuff with f

run_like_a_pipeline(
  datums=["big:/", "config:/cfgfile1.txt"],
  code=pl_body)
--- output ---
output is in '/data/a1b2c3'
---cell ---
matplotlib.plot("/data/a1b2c3")
--- output ---
<graph>
---

The text was updated successfully, but these errors were encountered:

msteffen · 2021-08-25T15:37:50Z

Our next goal for this prototype is to get <User> to use this for debugging failed datums; they specifically mentioned debugging failed datums as a sticking point that they're struggling with, and hopefully this will significantly reduce their iteration time when doing it

msteffen · 2021-09-23T01:50:11Z

Following up with our conversation on this yesterday:

run_like_a_pipeline should, at the minimum, allow you to specify a (pipeline, datum), download the files in that datum, and mount them into a container running locally (also in that local container: /pfs/out should be a bind-mounted tmp dir where you can see the output from processing that datum)
If users specify code=pl_body, then the container in (1) is some bog-standard python container, and the command becomes, essentially python -c <function body>, a la Kubeflow function-based components. Otherwise, we could use the pipeline image, or maybe allow users to specify their own python image and run their code in that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prototype `python_pachyderm.run_like_a_pipeline` #320

prototype `python_pachyderm.run_like_a_pipeline` #320

albscui commented Aug 9, 2021

msteffen commented Aug 25, 2021 •

edited

msteffen commented Sep 23, 2021 •

edited

prototype python_pachyderm.run_like_a_pipeline #320

prototype python_pachyderm.run_like_a_pipeline #320

Comments

albscui commented Aug 9, 2021

msteffen commented Aug 25, 2021 • edited

msteffen commented Sep 23, 2021 • edited

prototype `python_pachyderm.run_like_a_pipeline` #320

prototype `python_pachyderm.run_like_a_pipeline` #320

msteffen commented Aug 25, 2021 •

edited

msteffen commented Sep 23, 2021 •

edited