[BUG/Feature Request]: Reusing a step overwrites artifact names #2685

jlopezpena · 2024-05-09T09:01:17Z

Contact Details [Optional]

No response

System Information

N/A

What happened?

(Issue already discussed in this slack thread, logging here for easier tracking)

Currently, the names used for saving artifacts in a step are determined by the type annotation in the function definition.
This works great when a step is only used once in a pipeline, but not so much when the same step needs to be called multiple times with different inputs, and the resulting artifacts need to later be used in a different pipeline. When that happens, the output artifacts get saved with the same name and a bumped version number, which makes it really hard to track the specific one needed later down the road.

Typical example of this: training some preprocessor that later needs to be used three different times for transforming train, validation, and test data, and I end up with three versions of an object called transformed_data and need to keep track of which is which.

Returning outputs from pipelines quickly gets out of control and is very hard to maintain when there are lots of artifacts that might potentially be reused later.

Suggested solution: Similar to how a step name (usually the function name) can be overriden at run time by the id parameter when calling the step, introduce an optional parameter to step calls (something like output_names: Optional[Dict[str, str]] where the dict must contain the names defined in the function type annotations as keys and the desired saved names as values) overriding the saved names of the outputs. If that parameter is not passed, things should behave as they currently do, but when passed the produced artifacts should be saved with the passed names.

Reproduction steps

No response

Relevant log output

No response

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

avishniakov · 2024-05-24T10:01:20Z

Hey @jlopezpena , thanks for reporting, and sorry about the delay in responding to this!

This is indeed not possible in the current setting and to overcome this in our LLM finetuning project, I have used this trick: https://github.com/zenml-io/zenml-projects/blob/main/llm-lora-finetuning/steps/evaluate_model.py#L112

Aside from the workaround, I'll create a ticket to look into what we can do to improve this UX piece. Stay tuned!

jlopezpena added the bug Something isn't working label May 9, 2024

strickvl assigned avishniakov May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG/Feature Request]: Reusing a step overwrites artifact names #2685

[BUG/Feature Request]: Reusing a step overwrites artifact names #2685

jlopezpena commented May 9, 2024

avishniakov commented May 24, 2024

[BUG/Feature Request]: Reusing a step overwrites artifact names #2685

[BUG/Feature Request]: Reusing a step overwrites artifact names #2685

Comments

jlopezpena commented May 9, 2024

Contact Details [Optional]

System Information

What happened?

Reproduction steps

Relevant log output

Code of Conduct

avishniakov commented May 24, 2024