New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tag names limitation #3785
base: main
Are you sure you want to change the base?
Tag names limitation #3785
Conversation
Signed-off-by: boringbyte <kottapallisivaram@gmail.com>
Signed-off-by: boringbyte <kottapallisivaram@gmail.com>
fd522dd
to
338d279
Compare
@@ -163,6 +163,9 @@ kedro run --tags=pipeline_tag | |||
|
|||
This will run only the nodes found within the pipeline tagged with `pipeline_tag`. | |||
|
|||
```{note} | |||
Valid node name and/or tag must contain only letters, digits, hyphens, underscores and/or fullstops. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing contribution @boringbyte , do you think the following is more useful?
Valid node name and/or tag must contain only letters, digits, hyphens, underscores and/or fullstops. | |
Node or tag names must ONLY contain letters, digits, hyphens, underscores and/or periods. Other symbols are not permitted. |
Question for @astrojuanlu and @rashidakanchwala - do we want to advertise the full stop` point because that can conflict with namespaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @datajoely. Your suggestion looks good. This is my first opensource contribution and going forward would love to contribute more. Another question: If periods conflict with namespaces, isn't it better to edit the error message in the code as well? https://github.com/kedro-org/kedro/blob/main/kedro/pipeline/node.py#L126-L139
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@datajoely I think this PR is better than not mentioning the limitation altogether, so should we just merge it in as it is and then iterate on the messaging?
@astrojuanlu @merelcht @noklam wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in getting back.
If a node_name contains a '.', there's a high chance it might be treated as if within a 'namespace'. We are currently encountering numerous issues on Kedro-Viz targeting nested modular pipelines that involve multiple '.' notations. If a node_name without a namespace contains a '.', then I have a feeling this might cause problems on Kedro-Viz.
I am conducting some tests next week on modular pipelines and will revert with more informed opinion once I have investigated this matter more thoroughly.
(this isn't the case with tags but I will confirm that too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My view is we either need to reserve .
exclusively for namespaces or document the functionality explicitly, I'd be very reticent to consciously avoid mentioning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi team,
With the recent refactor of Kedro-viz, the . notation for namespaces is a strong indicator that a node has a namespace especially for nested namespaces as this is the way Kedro defines it here. The modular pipeline visualization also takes this into account. Tags do not have namespaces, and with this update, namespaces associated with datasets are no longer considered. Only node namespaces matter.
I agree with @datajoely that we should exclusively reserve . for namespaces
UPDATED comment - below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also raises the question whether we should provide any validation on the front-end side because I know for a fact users use the dot syntax to simply organise hierarchies without actually using the namespace=
argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@datajoely - I thought the same, but now with the below test I realised '.' in node name might not matter. Kedro-viz basically gets node.namespace from Kedro node and then attains the level of nesting required based on the '.' notation in the namespace so node name can have '.' notation without interfering with namespaces on Kedro-viz
def create_pipeline(**kwargs) -> Pipeline:
return pipeline([
node(
lambda x: x,
inputs=["raw_data"],
outputs="model_inputs",
name="uk.data_processing.process_data",
tags=["split"],
),
node(
lambda x: x,
inputs=["model_inputs"],
outputs="model",
name="uk.data_science.train_model",
tags=["train"],
)
])
and I got the visualisation as the below, so no modular pipeline functionality despite using '.' notation
FYI - same result with new refactored code and current non-refactored code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks all! @boringbyte is it more clear for you how to proceed with this PR? If you need more clarity please let us know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@astrojuanlu I think I understood. I'll post here if I have any questions. Thanks.
Description
Addresses #3727
This pull request was initiated to provide more information about valid node name and tag.
Development notes
make build-docs
make lint
Developer Certificate of Origin
We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a
Signed-off-by
line in the commit message. See our wiki for guidance.If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.
Checklist
RELEASE.md
file