Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use tpot with MLFlow #1347

Open
Chakripotta opened this issue Mar 27, 2024 · 5 comments
Open

How to use tpot with MLFlow #1347

Chakripotta opened this issue Mar 27, 2024 · 5 comments

Comments

@Chakripotta
Copy link

Hi,
This is Chakri, I want to use TPOT with MLFLOW to track the model and to log the parameters and dependencies. I was unable to do so and it would really help me if you could provide me with some assistance.

Thanks in advance.

@perib
Copy link
Contributor

perib commented Apr 22, 2024

I haven't used MLFlow, is there an error you are getting that prevents TPOT from working?

@Chakripotta
Copy link
Author

I haven't used MLFlow, is there an error you are getting that prevents TPOT from working?

Hi Perib, Thanks for the reply. In Tpot, I could track experiments from the checkpoint folder. But while fitting a TPOT object, I do not know how to access the individual generation models to log them as .pkl files and calculate the metrics from them. Any help with this will be greatly appreciated.

@dominik-pichler
Copy link

@perib I can take a look at this if you want. I have not contributed to this project but it seems like a good issue to get into it.

@perib
Copy link
Contributor

perib commented May 10, 2024

sorry the delay in getting back to you.

So I answered a similar question in this issue, and I'll copy my answer here

Here are the models that you are able to access.

  1. The model with the best cv score fitted to the full training data. This is the model selected for est.fitted_pipeline_
  2. The list of Pareto front models fitted to the full training data
  3. With some work, you can extract all evaluated pipelines, but they will be unfitted. You can find more information here tpot.evaluated_individuals_ to pipeline #516
from tpot import TPOTRegressor, TPOTClassifier
from sklearn.model_selection import train_test_split
import sklearn
import sklearn.datasets
import tpot
import dill as pickle

X, y = sklearn.datasets.load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.80, test_size=0.20, random_state=42)

est = TPOTClassifier(generations=2, population_size=2, verbosity=2, random_state=42, n_jobs=-2 ,cv=10)

est.fit(X_train, y_train)


# 1 save the model with the best cv score fitted to the full training data.
pickle.dump(est.fitted_pipeline_, open('tpot_iris_pipeline.pkl', 'wb'))

# 2 save the list of unfitted Pareto front models
pickle.dump(list(est.pareto_front_fitted_pipelines_.values()), open('tpot_iris_pareto_front_models.pkl', 'wb'))

We are currently working on TPOT2 where you can more easily access all evaluated pipelines without workarounds. However, like in TPOT1, we do not train all pipelines on the full dataset so these pipelines are unfitted. Example here:

In TPOT2, we save all individuals evaluated along with their scores to a dataframe that can be accessed with est.evaluated_individuals

TPOT2 also allows for (multiple) custom objective functions if you want to use custom metrics.

@perib
Copy link
Contributor

perib commented May 10, 2024

You can find latest stable version of TPOT2 here: https://github.com/EpistasisLab/tpot2/tree/main

Though this branch is the most up to date, with a better API for defining search spaces, but still needs to be reviewed : https://github.com/EpistasisLab/tpot2/tree/search_space_api

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants