Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoMLSearch get_pipeline always returns pipelines with the same name #1912

Closed
freddyaboulton opened this issue Mar 2, 2021 · 3 comments · Fixed by #1958
Closed

AutoMLSearch get_pipeline always returns pipelines with the same name #1912

freddyaboulton opened this issue Mar 2, 2021 · 3 comments · Fixed by #1958
Assignees
Labels
bug Issues tracking problems with existing features. needs design Issues requiring design documentation.

Comments

@freddyaboulton
Copy link
Contributor

Repro:

from evalml.automl import AutoMLSearch
from evalml.demos import load_breast_cancer

X, y = load_breast_cancer()
automl = AutoMLSearch(X, y, problem_type="binary", max_batches=1)
automl.search()

pipelines = [automl.get_pipeline(i) for i in range(3)]
assert [p.name for p in pipelines] == ['LightGBM Classifier w/ Imputer', 'LightGBM Classifier w/ Imputer', 'LightGBM Classifier w/ Imputer']

The pipelines should not all have the same name. The estimators are different:

[p.estimator.name for p in pipelines]
['Baseline Classifier', 'Decision Tree Classifier', 'LightGBM Classifier']
@freddyaboulton freddyaboulton added bug Issues tracking problems with existing features. good first issue Issues which would be a good starting point for new hires. labels Mar 2, 2021
@dsherry dsherry added this to the Sprint 2021 Mar A milestone Mar 4, 2021
@angela97lin
Copy link
Contributor

It's not just the names; the hyperparameters (and I suspect other values that are stored on the class) are updated too. This is because we're using the GeneratedPipelineBinary class and updating its class variables. Since each pipeline is a GeneratedPipelineBinary, it will update it for the entire class and affect all instances.

@freddyaboulton
Copy link
Contributor Author

freddyaboulton commented Mar 8, 2021

I believe this stems from #1400 . I think we're in a pickle (pun intended) - our pipeline design relies on setting class attributes to define pipelines, which is conducive to having dynamically generated pipeline classes like we do in search. The problem is that then these pipelines can't be "exported" out of AutoMLSearch.

There may be an easy solution I'm overlooking but I think this will require a deep design discussion if we want to fix this and keep our automl pipelines pickle-able.

@freddyaboulton freddyaboulton added needs design Issues requiring design documentation. and removed good first issue Issues which would be a good starting point for new hires. labels Mar 8, 2021
@dsherry dsherry removed this from the Sprint 2021 Mar A milestone Mar 9, 2021
@dsherry
Copy link
Contributor

dsherry commented Mar 10, 2021

Plan
Short-term: this issue tracks reverting the change from #1400 to resolve the buggy behavior. Our pipelines won't support python pickle but will still be serializable using the existing save/load functionality which uses cloudpickle.

Long-term: after the revert, #1956 tracks figuring out how we should support saving evalml pipelines using python pickle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issues tracking problems with existing features. needs design Issues requiring design documentation.
Projects
None yet
4 participants