Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs databricks asset bundles #3744

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

erwinpaillacan
Copy link

@erwinpaillacan erwinpaillacan commented Mar 26, 2024

Description

This pull request was initiated to assist in establishing a project utilizing asset bundles on Databricks, as the use of DBX is deprecated and no longer recommended.

https://www.databricks.com/blog/announcing-general-availability-databricks-asset-bundles

Development notes

make build-docs

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

Signed-off-by: erwinpaillacan <erwin_paillacan@mckinsey.com>
Signed-off-by: erwinpaillacan <erwin_paillacan@mckinsey.com>
@erwinpaillacan
Copy link
Author

erwinpaillacan commented Mar 26, 2024

@cilopezs also is helping!

@erwinpaillacan erwinpaillacan changed the title Add docs asset bundles Add docs databricks asset bundles Mar 26, 2024
@astrojuanlu
Copy link
Member

Thank you @erwinpaillacan and @cilopezs for this PR!

This addresses #3360 in part. In your opinion, do you think it still makes sense to keep the DBX docs around? I was thinking that we should remove them, and replace them by what you did here.

@erwinpaillacan
Copy link
Author

Thank you @erwinpaillacan and @cilopezs for this PR!

This addresses #3360 in part. In your opinion, do you think it still makes sense to keep the DBX docs around? I was thinking that we should remove them, and replace them by what you did here.

Yes, in our opinion we could update the page https://docs.kedro.org/en/stable/deployment/databricks/databricks_ide_development_workflow.html with databricks connect, which for some time was deprecated but now it is live again and being recommended for development https://docs.databricks.com/en/dev-tools/databricks-connect/python/index.html

and for deployment we can rely on asset bundle which is the main purpose.

I think we need to update the decision plot, right?

Signed-off-by: erwinpaillacan <erwin_paillacan@mckinsey.com>
Signed-off-by: erwinpaillacan <erwin_paillacan@mckinsey.com>
@erwinpaillacan
Copy link
Author

Heads up! @astrojuanlu @cilopezs

  1. Development workflow updated to Databricks Connect: https://docs.databricks.com/en/dev-tools/databricks-connect/python/index.html
  2. Deployment workflow asset bundles: https://docs.databricks.com/en/dev-tools/bundles/index.html

We are removing dbx

Copy link
Contributor

@datajoely datajoely left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing @erwinpaillacan thank you for the hard work, only challenge is if we should make the hook part of the addon/starter and not require a manual change


```ipython
%reload_kedro /Workspace/Repos/<databricks_username>/iris-databricks
## Modify spark hook
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just update the starter/add on so this is default?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like the right approach. I'll test both guides using a new starter along with a fresh Kedro hook to ensure everything functions as expected

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datajoely , the integration described in hooks.py, following the guidance from the tutorial by @diegoliraQB on the Kedro blog (https://kedro.org/blog/how-to-integrate-kedro-and-databricks-connect), seems to be working smoothly using asset bundles and databricks-connect. So, it's ready to be used in the databricks-starter.

I'm considering whether it makes sense for me to open a pull request (PR). Since I didn't create this code snippet myself, I'm not entirely sure about it.

This documentation pull request (PR) will likely depend on the starter, I suppose.

@astrojuanlu
Copy link
Member

The docs errors are legitimate, please address them so RTD can render the new docs 👍🏽

Signed-off-by: erwinpaillacan <erwin_paillacan@mckinsey.com>
Signed-off-by: erwinpaillacan <erwin_paillacan@mckinsey.com>
Signed-off-by: erwinpaillacan <erwin_paillacan@mckinsey.com>
Signed-off-by: erwinpaillacan <erwin_paillacan@mckinsey.com>
Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi folks! @erwinpaillacan I understand that there aren't really any major outstanding comments here, and that #3744 (comment) can be addressed as a separate PR. Is that right?

Could you have a look at the vale check, which flagged some spelling and style errors? And hopefully we can get this merged.

…ow.md

Co-authored-by: Juan Luis Cano Rodríguez <hello@juanlu.space>
Signed-off-by: erwinpaillacan <43705290+erwinpaillacan@users.noreply.github.com>
Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a clear improvement over what we have now 💯 and just needs some style fixes reported by Vale for all the checks to pass. Thanks @erwinpaillacan for your patience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants