Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diff transformer #9

Open
solalatus opened this issue Oct 13, 2019 · 3 comments
Open

Diff transformer #9

solalatus opened this issue Oct 13, 2019 · 3 comments

Comments

@solalatus
Copy link

Hi,

Pdpipe is marvelous, very nice tool!

One addon possibility that I was wondering:
In case of time series, de-trending is a common operation, done by taking the first differential of the data eg. with pd.DataFrame.diff(1). The problem with this is, that the initial value gets dropped and there is no easy way to "back transform".
Here a "fittable", Scikit like transformer could come in handy.

I have sketched such a thing for myself here: https://gist.github.com/solalatus/9a3fc5330e7c0cd83e61094db75d2dc3

Can this be interesting as an addition?
Many thanks!

@shaypal5
Copy link
Collaborator

shaypal5 commented Dec 3, 2019

Yes, definitely! Though I have to say no pdpipe stage at the moment has an inverse_transform method, so you still won't have invertible pipelines... :|

@solalatus
Copy link
Author

Well, the Scikit dependent ones might. Or am I mistaken?

@shaypal5
Copy link
Collaborator

shaypal5 commented Dec 21, 2019

Yep, the sklearn ones can definitely be made invertible.

The NLTK ones for sure don't. For example, if you drop rare tokens or stem words, you have no way to go back, as these are transformations that map many different inputs into the same output (e.g. "grabbing" and "grabbed" to "grab").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants