Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sklearn Pipeline support and examples #231

Open
mjbommar opened this issue Jan 28, 2022 · 3 comments
Open

sklearn Pipeline support and examples #231

mjbommar opened this issue Jan 28, 2022 · 3 comments
Labels
Type: Epic 🤙 Describes a large amount of functionality that will likely be broken down into smaller issues

Comments

@mjbommar
Copy link

Feature Description

DP is valuable in many "traditional" machine learning pipelines, and sklearn is the largest "traditional" ML ecosystem in Python. Would examples or first-class support for scikit-learn Pipeline workflows be worth contributing? We (@licensio) would be happy to contribute this via PR.

Is your feature request related to a problem?

The "framework-free" examples could easily be adapted to sklearn workflows, but substantially more concise usage would be possible with proper sklearn.Pipeline support.

What alternatives have you considered?

As discussed above, sklearn users could adapt the framework-agnostic examples.

Additional Context

N/A

@mjbommar mjbommar added the Type: New Feature ➕ Introduction of a completely new addition to the codebase label Jan 28, 2022
@dvadym
Copy link
Collaborator

dvadym commented Jan 29, 2022

Thanks Michael for suggestion! It sounds interesting. We're open to add native support of different APIs (though having an example is a good start). We have on our roadmap to have better integration with the Python ecosystem.

Let's at first understand how it might look like. I'm not familiar with scikit-learn Pipeline workflows (I've just quickly checked its documentation). Could you please explain your ideas for an example of using PipelineDP and scikit-learn Pipeline?

@mjbommar
Copy link
Author

Let me work up a few options. I think there might be two distinct use cases - one for unsupervised workflows (e.g., clustering) and one for supervised workflows (e.g., regression).

In the meantime, here are a few more references that might be helpful if you are curious:

@miguelagt
Copy link

Hey Michael:

We've had a couple of internal teams think about this. Would you be open to a 30 min chat on this topic?

@chinmayshah99 chinmayshah99 changed the title FEAT: sklearn Pipeline support and examples sklearn Pipeline support and examples Feb 15, 2022
@chinmayshah99 chinmayshah99 added Type: Epic 🤙 Describes a large amount of functionality that will likely be broken down into smaller issues and removed Type: New Feature ➕ Introduction of a completely new addition to the codebase labels Feb 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Epic 🤙 Describes a large amount of functionality that will likely be broken down into smaller issues
Projects
None yet
Development

No branches or pull requests

4 participants