Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG/Feature Request]: Reusing a step overwrites artifact names #2685

Open
1 task done
jlopezpena opened this issue May 9, 2024 · 1 comment
Open
1 task done

[BUG/Feature Request]: Reusing a step overwrites artifact names #2685

jlopezpena opened this issue May 9, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@jlopezpena
Copy link
Contributor

Contact Details [Optional]

No response

System Information

N/A

What happened?

(Issue already discussed in this slack thread, logging here for easier tracking)

Currently, the names used for saving artifacts in a step are determined by the type annotation in the function definition.
This works great when a step is only used once in a pipeline, but not so much when the same step needs to be called multiple times with different inputs, and the resulting artifacts need to later be used in a different pipeline. When that happens, the output artifacts get saved with the same name and a bumped version number, which makes it really hard to track the specific one needed later down the road.

Typical example of this: training some preprocessor that later needs to be used three different times for transforming train, validation, and test data, and I end up with three versions of an object called transformed_data and need to keep track of which is which.

Returning outputs from pipelines quickly gets out of control and is very hard to maintain when there are lots of artifacts that might potentially be reused later.

Suggested solution: Similar to how a step name (usually the function name) can be overriden at run time by the id parameter when calling the step, introduce an optional parameter to step calls (something like output_names: Optional[Dict[str, str]] where the dict must contain the names defined in the function type annotations as keys and the desired saved names as values) overriding the saved names of the outputs. If that parameter is not passed, things should behave as they currently do, but when passed the produced artifacts should be saved with the passed names.

Reproduction steps

No response

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@jlopezpena jlopezpena added the bug Something isn't working label May 9, 2024
@avishniakov
Copy link
Contributor

Hey @jlopezpena , thanks for reporting, and sorry about the delay in responding to this!

This is indeed not possible in the current setting and to overcome this in our LLM finetuning project, I have used this trick: https://github.com/zenml-io/zenml-projects/blob/main/llm-lora-finetuning/steps/evaluate_model.py#L112

Aside from the workaround, I'll create a ticket to look into what we can do to improve this UX piece. Stay tuned!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants