Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Append & Merge PdPipeline Stages for Joining Pandas DataFrames #47

Open
Asrst opened this issue Jun 5, 2021 · 6 comments
Open

Append & Merge PdPipeline Stages for Joining Pandas DataFrames #47

Asrst opened this issue Jun 5, 2021 · 6 comments

Comments

@Asrst
Copy link

Asrst commented Jun 5, 2021

Thanks for this library, it made my pandas workflow clean & easy to maintain.

Feature Request:
pd.merge is a common operation when dealing with 2 or more data frames & pipeline stage for this is missing.
Currently, I am implementing a custom pipeline stage pdp.Merge from PdPipelineStage & would like to contribute.

@Asrst Asrst changed the title Merge Pipeline Stage for Pandas DataFrame Merge PdPipelineStage for Pandas Joining Pandas DataFrames Jun 5, 2021
@shaypal5
Copy link
Collaborator

shaypal5 commented Jun 5, 2021

Sure, I'd appreciate any contribution! :)
Let me know if you need any help.

And thank you for the kind words. :)

@Asrst
Copy link
Author

Asrst commented Sep 6, 2021

Hi, can we close this issue.? I already raised a pull request with this features #48

Let me know, if you need any help in migrating to GitHub actions/releasing the changes.

@shaypal5
Copy link
Collaborator

shaypal5 commented Sep 13, 2021

Hey @Asrst ,

Sorry for the belated response! :)
I've migrated successfully to Github Actions.

I'd love it if you can rebase over the current head of the master branch, and then open a new pull request.
Tests should then work properly, including linting and coverage reports by codecov.

Cheers,
Shay

@shaypal5
Copy link
Collaborator

shaypal5 commented Sep 13, 2021

By the way, you can see here your code reduces test coverage below 100% (which we cannot have):
#53

Please enrich tests to cover all cases you accounted for in your code.

I don't want to use that PR (#53, mine), as it has two merge commits. I rather you rebase over the master and open a PR with a single code commit, which I can then fast-forward to, to avoid a merge commit completely. :)

@Asrst
Copy link
Author

Asrst commented Sep 26, 2021

hi @shaypal5,

Improved the code coverage, rebased over the master & raised a new pull request. #55

@Asrst Asrst changed the title Merge PdPipelineStage for Pandas Joining Pandas DataFrames Append & Merge PdPipeline Stages for Joining Pandas DataFrames Sep 26, 2021
@shaypal5
Copy link
Collaborator

shaypal5 commented Sep 29, 2021

Hey @Asrst ,

I'm so sorry. I just got to reading the PR, and unfortunately the current implementation does not make a lot of sense.
It seems like you wrote two stages that get a single dataframe when creating the pipeline, and then on application append or merge it to the input dataframe.

Can you explain the use case? Share your use of it?

I think stages using the pandas.DataFrame.append and pandas.DataFrame.merge operations must mainly cater to use cases where both dataframes are somehow inputted on application time, and not one of them being statically set when creating the pipeline.

If your use case is very specific, I would prefer you stick with a AdHocStage that implements the same functionality. And even if you show this to be a generalizable use case, stage names will have to be more specific, as this is a very specific use of the methods. Maybe AppendFixedFrame and MergedFixedFrame or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants