How to use tpot with MLFlow #1347

Chakripotta · 2024-03-27T09:58:25Z

Hi,
This is Chakri, I want to use TPOT with MLFLOW to track the model and to log the parameters and dependencies. I was unable to do so and it would really help me if you could provide me with some assistance.

Thanks in advance.

perib · 2024-04-22T19:39:22Z

I haven't used MLFlow, is there an error you are getting that prevents TPOT from working?

Chakripotta · 2024-04-28T06:41:19Z

I haven't used MLFlow, is there an error you are getting that prevents TPOT from working?

Hi Perib, Thanks for the reply. In Tpot, I could track experiments from the checkpoint folder. But while fitting a TPOT object, I do not know how to access the individual generation models to log them as .pkl files and calculate the metrics from them. Any help with this will be greatly appreciated.

dominik-pichler · 2024-05-06T06:45:48Z

@perib I can take a look at this if you want. I have not contributed to this project but it seems like a good issue to get into it.

perib · 2024-05-10T21:20:32Z

sorry the delay in getting back to you.

So I answered a similar question in this issue, and I'll copy my answer here

Here are the models that you are able to access.

The model with the best cv score fitted to the full training data. This is the model selected for est.fitted_pipeline_
The list of Pareto front models fitted to the full training data
With some work, you can extract all evaluated pipelines, but they will be unfitted. You can find more information here tpot.evaluated_individuals_ to pipeline #516

from tpot import TPOTRegressor, TPOTClassifier
from sklearn.model_selection import train_test_split
import sklearn
import sklearn.datasets
import tpot
import dill as pickle

X, y = sklearn.datasets.load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.80, test_size=0.20, random_state=42)

est = TPOTClassifier(generations=2, population_size=2, verbosity=2, random_state=42, n_jobs=-2 ,cv=10)

est.fit(X_train, y_train)


# 1 save the model with the best cv score fitted to the full training data.
pickle.dump(est.fitted_pipeline_, open('tpot_iris_pipeline.pkl', 'wb'))

# 2 save the list of unfitted Pareto front models
pickle.dump(list(est.pareto_front_fitted_pipelines_.values()), open('tpot_iris_pareto_front_models.pkl', 'wb'))

We are currently working on TPOT2 where you can more easily access all evaluated pipelines without workarounds. However, like in TPOT1, we do not train all pipelines on the full dataset so these pipelines are unfitted. Example here:

In TPOT2, we save all individuals evaluated along with their scores to a dataframe that can be accessed with est.evaluated_individuals

TPOT2 also allows for (multiple) custom objective functions if you want to use custom metrics.

perib · 2024-05-10T21:22:36Z

You can find latest stable version of TPOT2 here: https://github.com/EpistasisLab/tpot2/tree/main

Though this branch is the most up to date, with a better API for defining search spaces, but still needs to be reviewed : https://github.com/EpistasisLab/tpot2/tree/search_space_api

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use tpot with MLFlow #1347

How to use tpot with MLFlow #1347

Chakripotta commented Mar 27, 2024

perib commented Apr 22, 2024

Chakripotta commented Apr 28, 2024

dominik-pichler commented May 6, 2024

perib commented May 10, 2024

perib commented May 10, 2024

How to use tpot with MLFlow #1347

How to use tpot with MLFlow #1347

Comments

Chakripotta commented Mar 27, 2024

perib commented Apr 22, 2024

Chakripotta commented Apr 28, 2024

dominik-pichler commented May 6, 2024

perib commented May 10, 2024

perib commented May 10, 2024