Skip to content
This repository has been archived by the owner on Jul 3, 2023. It is now read-only.

Correct Hamilton pattern when a data pipeline requires merging two DataFrames? #231

Answered by skrawcz
filpia asked this question in Q&A
Discussion options

You must be logged in to vote

@filpia so sorry for the very very slow response; let me fix my notification settings so it doesn't happen again.

Good question, and there's a few approaches here; at a high level, with Hamilton you could model this process entirely through functions. The different approaches therefore are really just determining the boundaries of the DAG you're grouping/curating:

  • Your solution, as you mention, seems like a perfectly valid approach. If the "glue" code of merging is context specific business logic, then that's not a bad way to keep those parts quite separate.
    dr1 = driver.Driver({...}, feature_module1, ...)
    df1 = dr1.execute(...)
    dr2 = driver.Driver({...}, feature_module2, ...)
    df2 = dr2.e…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by skrawcz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants