Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: can we save all the evaluated pipelines as fitted model? #1318

Open
kirane61 opened this issue Aug 25, 2023 · 1 comment
Open

Question: can we save all the evaluated pipelines as fitted model? #1318

kirane61 opened this issue Aug 25, 2023 · 1 comment

Comments

@kirane61
Copy link

kirane61 commented Aug 25, 2023

I was looking for a way to extract fitted pipelines of all the pipelines or individuals evaluated by TPO. Is there any way that we can save all the evaluated pipelines as fitted models?

For example, if I set my generations to 2 and my population size to 2, then I want to save all six fitted pipelines evaluated by tpot for my further usage. Is there any way I can get the pipelines fitted so that I can use them directly without training them again?

@perib
Copy link
Contributor

perib commented Aug 31, 2023

The short answer is no. TPOT only fits the pareto front models (including the best model) to the full training set. TPOT does not save the fitted models for each fold of the CV.

Here are the models that you are able to access.

  1. The model with the best cv score fitted to the full training data.
  2. The list of Pareto front models fitted to the full training data
  3. With some work, you can extract all evaluated pipelines, but they will be unfitted. You can find more information here tpot.evaluated_individuals_ to pipeline #516
from tpot import TPOTRegressor, TPOTClassifier
from sklearn.model_selection import train_test_split
import sklearn
import sklearn.datasets
import tpot
import dill as pickle

X, y = sklearn.datasets.load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.80, test_size=0.20, random_state=42)

est = TPOTClassifier(generations=2, population_size=2, verbosity=2, random_state=42, n_jobs=-2 ,cv=10)

est.fit(X_train, y_train)


# 1 save the model with the best cv score fitted to the full training data.
pickle.dump(est.fitted_pipeline_, open('tpot_iris_pipeline.pkl', 'wb'))

# 2 save the list of unfitted Pareto front models
pickle.dump(list(est.pareto_front_fitted_pipelines_.values()), open('tpot_iris_pareto_front_models.pkl', 'wb'))

We are currently working on TPOT2 where you can more easily access all evaluated pipelines without workarounds. However, like in TPOT1, we do not train all pipelines on the full dataset so these pipelines are unfitted. Example here:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants