Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tag names limitation #3785

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

boringbyte
Copy link

@boringbyte boringbyte commented Apr 7, 2024

Description

Addresses #3727
This pull request was initiated to provide more information about valid node name and tag.

Development notes

make build-docs
make lint

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

Signed-off-by: boringbyte <kottapallisivaram@gmail.com>
Signed-off-by: boringbyte <kottapallisivaram@gmail.com>
@@ -163,6 +163,9 @@ kedro run --tags=pipeline_tag

This will run only the nodes found within the pipeline tagged with `pipeline_tag`.

```{note}
Valid node name and/or tag must contain only letters, digits, hyphens, underscores and/or fullstops.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing contribution @boringbyte , do you think the following is more useful?

Suggested change
Valid node name and/or tag must contain only letters, digits, hyphens, underscores and/or fullstops.
Node or tag names must ONLY contain letters, digits, hyphens, underscores and/or periods. Other symbols are not permitted.

Question for @astrojuanlu and @rashidakanchwala - do we want to advertise the full stop` point because that can conflict with namespaces?

Copy link
Author

@boringbyte boringbyte Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @datajoely. Your suggestion looks good. This is my first opensource contribution and going forward would love to contribute more. Another question: If periods conflict with namespaces, isn't it better to edit the error message in the code as well? https://github.com/kedro-org/kedro/blob/main/kedro/pipeline/node.py#L126-L139

Copy link
Member

@idanov idanov Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datajoely I think this PR is better than not mentioning the limitation altogether, so should we just merge it in as it is and then iterate on the messaging?

@astrojuanlu @merelcht @noklam wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in getting back.

If a node_name contains a '.', there's a high chance it might be treated as if within a 'namespace'. We are currently encountering numerous issues on Kedro-Viz targeting nested modular pipelines that involve multiple '.' notations. If a node_name without a namespace contains a '.', then I have a feeling this might cause problems on Kedro-Viz.

I am conducting some tests next week on modular pipelines and will revert with more informed opinion once I have investigated this matter more thoroughly.

(this isn't the case with tags but I will confirm that too)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My view is we either need to reserve . exclusively for namespaces or document the functionality explicitly, I'd be very reticent to consciously avoid mentioning

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi team,

With the recent refactor of Kedro-viz, the . notation for namespaces is a strong indicator that a node has a namespace especially for nested namespaces as this is the way Kedro defines it here. The modular pipeline visualization also takes this into account. Tags do not have namespaces, and with this update, namespaces associated with datasets are no longer considered. Only node namespaces matter.

I agree with @datajoely that we should exclusively reserve . for namespaces

UPDATED comment - below

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also raises the question whether we should provide any validation on the front-end side because I know for a fact users use the dot syntax to simply organise hierarchies without actually using the namespace= argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datajoely - I thought the same, but now with the below test I realised '.' in node name might not matter. Kedro-viz basically gets node.namespace from Kedro node and then attains the level of nesting required based on the '.' notation in the namespace so node name can have '.' notation without interfering with namespaces on Kedro-viz

def create_pipeline(**kwargs) -> Pipeline:
    return pipeline([
        
            node(
                lambda x: x,
                inputs=["raw_data"],
                outputs="model_inputs",
                name="uk.data_processing.process_data",
                tags=["split"],
            ),
            node(
                lambda x: x,
                inputs=["model_inputs"],
                outputs="model",
                name="uk.data_science.train_model",
                tags=["train"],
            )
        ])    

and I got the visualisation as the below, so no modular pipeline functionality despite using '.' notation
Screenshot 2024-05-17 at 12 46 41

FYI - same result with new refactored code and current non-refactored code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks all! @boringbyte is it more clear for you how to proceed with this PR? If you need more clarity please let us know

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astrojuanlu I think I understood. I'll post here if I have any questions. Thanks.

@merelcht merelcht linked an issue May 21, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add info about tag names convention/limitation
5 participants