New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a docs page about adding code beyond starter files #3852
base: main
Are you sure you want to change the base?
Create a docs page about adding code beyond starter files #3852
Conversation
Signed-off-by: Yury Fedotov <yury_fedotov@mckinsey.com>
Signed-off-by: Yury Fedotov <yury_fedotov@mckinsey.com>
Signed-off-by: Yury Fedotov <yury_fedotov@mckinsey.com>
@noklam, @astrojuanlu, could you please have a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for this contribution @yury-fedotov! This is already looking great ⭐
I left some comments, mostly small nits.
`__main__.py` file of your project, but such advanced customizations are not in scope of this article. | ||
``` | ||
|
||
This being the only constraint means that you can, for example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if the points below are linked to the section that describe them in more detail. A bit like how we setup the configuration page with "how to.." https://kedro--3852.org.readthedocs.build/en/3852/configuration/configuration_basics.html#how-to-use-kedro-configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Merel, great suggestion, just implemented this. Rendered build here. Let me know if that's what you had in mind.
Comment on customizations Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Yury Fedotov <102987839+yury-fedotov@users.noreply.github.com>
For example in splitting Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Yury Fedotov <102987839+yury-fedotov@users.noreply.github.com>
Make provided examples title not bold Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Yury Fedotov <102987839+yury-fedotov@users.noreply.github.com>
And as where appropriate Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Yury Fedotov <102987839+yury-fedotov@users.noreply.github.com>
Organise spelling Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Yury Fedotov <102987839+yury-fedotov@users.noreply.github.com>
Proper wording for pip install note Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Yury Fedotov <102987839+yury-fedotov@users.noreply.github.com>
Implies to consists replacement Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Yury Fedotov <102987839+yury-fedotov@users.noreply.github.com>
Signed-off-by: Yury Fedotov <yury_fedotov@mckinsey.com> # Conflicts: # RELEASE.md
Signed-off-by: Yury Fedotov <yury_fedotov@mckinsey.com>
Hey @merelcht - thanks for comments again, I just finished implementing them. QQ also about the |
Yes, it's a soft check. I usually treat it as a guidance on grammar/word usage. It does also check for spelling mistakes, which is helpful. But you definitely don't need to address it all. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, thank you for the contribution!
It would be great if @stichbury can help review the style of the docs a little bit, otherwise have you seen https://github.com/kedro-org/kedro/wiki/Kedro-documentation-style-guide already?
I like a lot of the recommendation here, but I hesitate to make it an official recommendation without the team discussing a bit more. Should we extract the content and make it a guest blog post instead? @astrojuanlu
read and collaborate on as your codebase grows. | ||
Those files also sometimes make new users think that Kedro requires code | ||
to be located only in those starter files, which is not true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the clarification, but I found this sounds like coming from an user rather than the official docs.
#2512 (comment), we have an answer for this and the issue is still opened. Would it be better to actually write the documentation and link here instead?
```{note} | ||
You actually can make Kedro look for pipeline registry in a different place by modifying the | ||
`__main__.py` file of your project, but such advanced customisations are not in scope for this section. | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we actually do this? My impressions is no because
def configure_project(package_name: str) -> None:
"""Configure a Kedro project by populating its settings with values
defined in user's settings.py and pipeline_registry.py.
"""
settings_module = f"{package_name}.settings"
settings.configure(settings_module)
pipelines_module = f"{package_name}.pipeline_registry"
pipelines.configure(pipelines_module)
I think these are the only two files that you cannot change the location, the comment I link above has a demo project that show what a "minimal Kedro Project" looks like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed:
kedro/kedro/framework/project/__init__.py
Lines 279 to 280 in 7391f4f
pipelines_module = f"{package_name}.pipeline_registry" | |
pipelines.configure(pipelines_module) |
``` | ||
|
||
This being the only constraint means that you can, for example: | ||
* Add `utils.py` file to a pipeline folder and import utilities used by multiple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually avoid utils.py
as much as possible as they are the bin for everything. It's ironic because kedro
do have utils
module that are left from years ago. It hasn't been growing though as we believe it's better to have explicit module.
This being the only constraint means that you can, for example: | ||
* Add `utils.py` file to a pipeline folder and import utilities used by multiple | ||
functions in `nodes.py` from there. | ||
* [Share modules between pipelines](#sharing-modules-between-pipelines). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be really beneficial if we have something to showcase, maybe there are some projects in awesome-kedro
that we can link to?
Even just the tree
structure of a project would be great.
* Add `utils.py` file to a pipeline folder and import utilities used by multiple | ||
functions in `nodes.py` from there. | ||
* [Share modules between pipelines](#sharing-modules-between-pipelines). | ||
Those could be utility functionalities, or your standalone module responsible for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those could be utility functionalities, or your standalone module responsible for | |
Those could be utility functionalities, or standalone module responsible for |
functions in `nodes.py` from there. | ||
* [Share modules between pipelines](#sharing-modules-between-pipelines). | ||
Those could be utility functionalities, or your standalone module responsible for | ||
the domain logic of the industry you work at. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the domain logic of the industry you work at. | |
the domain specific logic. |
or the domain logic of the specific industry
?
* Instead of having a single `pipeline.py` in your pipeline folder, split it, for example, | ||
into `historical_pipeline.py` and `inference_pipeline.py`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we usually go for the namespace/modular structure more:
- pipelines
- historical
- inference
- pipeline.py
@astrojuanlu thoughts?
* Instead of having a single `pipeline.py` in your pipeline folder, split it, for example, | ||
into `historical_pipeline.py` and `inference_pipeline.py`. | ||
* Instead of registering many pipelines in `register_pipelines()` function one by one, | ||
create a few `tp.Dict[str, Pipeline]` objects in different places of the project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create a few `tp.Dict[str, Pipeline]` objects in different places of the project | |
create a few `dict[str, Pipeline]` objects in different places of the project |
into `historical_pipeline.py` and `inference_pipeline.py`. | ||
* Instead of registering many pipelines in `register_pipelines()` function one by one, | ||
create a few `tp.Dict[str, Pipeline]` objects in different places of the project | ||
and then make `register_pipelines()` return a union of those. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about find_pipelines()
?
I am happy to review/edit this if you decide to retain as docs, but please let me know what you decide about this:
LMK if you want to have a blog post instead, which sounds like a great idea, and then you could extract from the post if you wanted a longer-term piece of content that you maintain officially as docs. I wouldn't be able to craft the post without some notice and a ticket, but would definitely support the idea and help with later stages to edit and post, if you need it. Just ping me again when you decide how to go forward! |
Hi @stichbury, we've discussed this in backlog grooming and decided we'd like to get it in as docs. I really appreciate any help you can give getting this in shape 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this contribution @yury-fedotov ! I left a couple of comments of outstanding things I saw.
I agree it would be nice to have this as a "How-To" guide (pretty much aligned with the https://diataxis.fr/ taxonomy and other parts of our docs). I would have it maybe under "Advanced usage", somewhere, although just for continuity, not really a fan of the "Advanced" name there.
```{note} | ||
You actually can make Kedro look for pipeline registry in a different place by modifying the | ||
`__main__.py` file of your project, but such advanced customisations are not in scope for this section. | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed:
kedro/kedro/framework/project/__init__.py
Lines 279 to 280 in 7391f4f
pipelines_module = f"{package_name}.pipeline_registry" | |
pipelines.configure(pipelines_module) |
│ ├── optimizer <-- Standalone package. | ||
│ └── dashboard <-- Standalone package, may import `optimizer`, but should not know anything about model training pipeline. | ||
├── requirements.txt <-- Linters, code formatters... Not dependencies of packages. | ||
├── pyproject.toml <-- Settings for those, like `[tool.isort]`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice that having this astray pyproject.toml
file here automatically communicates to pip
that this is potentially a Python package. This has been an ongoing source of pain in the ecosystem, see my summary here #2280 (comment)
Now what's the best practice to manage tool configs in monorepos for Python given this issue... I have no idea, to be honest
For sure! I'm not sure if I'll be able to get to this in this sprint, but I'll add to my list; otherwise it'll be next week. To save holding you up, probably best to get it into the state where you want me to do a final review/update and let me know it is ready. |
Thanks for the comments team! I'll get to them next week and come back. |
Description
Per this discussion in Slack with @noklam, I took a task of creating a guide on how to add code beyond starter files in Kedro.
P.S. It's my first contribution to the project, so I'd be happy to iterate as much as needed until it satisfies core team's needs. And appreciate any feedback.
Development notes
Developer Certificate of Origin
We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a
Signed-off-by
line in the commit message. See our wiki for guidance.If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.
Checklist
RELEASE.md
file