Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document expected schema for generic components #795

Open
picousse opened this issue Jan 18, 2024 · 2 comments
Open

Document expected schema for generic components #795

picousse opened this issue Jan 18, 2024 · 2 comments
Labels
documentation Improvements or additions to documentation

Comments

@picousse
Copy link

hi,
some minor stuff I encountered running locally.

Current code:

import pyarrow as pa
pipeline = Pipeline(
    name="protein_pipeline",
    base_path="./data",
)

dataset = pipeline.read(
    "load_from_parquet",
    arguments={
        "dataset_uri": "/data/proteins.parquet",
    },
)

from fondant.pipeline.runner import DockerRunner

runner = DockerRunner()
runner.run(input=pipeline)

what was unclear for me:

  • data path. This the path in the docker (/data/...). This is unclear based on the documentation (or I might have missed it)
  • for load_from_parquet, the produces values are crucial. there is no type inference.

I read https://fondant.ai/en/latest/pipeline/ and both issue did not seem clear to me.

@picousse
Copy link
Author

Also datatypes have to be pyarrow datatypes in the consume. This was not clear to me based on https://github.com/ml6team/fondant/tree/main/components/load_from_parquet

@RobbeSneyders
Copy link
Member

  • data path. This the path in the docker (/data/...). This is unclear based on the documentation (or I might have missed it)

This is the path on your local (or remote) file system, which will be mounted in docker. Is that how you understood it, or did you understand it differently?

  • for load_from_parquet, the produces values are crucial. there is no type inference.

Indeed, I think this is documented both in our general documentation and the component documentation.

  • Also datatypes have to be pyarrow datatypes in the consume. This was not clear to me based on

This is indeed not clearly documented in the component documentation. Would be good to add.

@RobbeSneyders RobbeSneyders added the documentation Improvements or additions to documentation label Jan 18, 2024
@RobbeSneyders RobbeSneyders changed the title Documentation local runs. Document expected schema for generic components Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
Status: Backlog
Development

No branches or pull requests

2 participants