TPOT2 and the future of TPOT development -- From the Devs #1322

perib · 2023-09-21T00:40:40Z

Since the release of TPOT in 2016, we and others have experimented with several ideas and improvements to the algorithm. However, due to the structure of TPOT's codebase, it has been difficult to merge all these features under one package. TPOT's code can be challenging to parse and modify. The result is a fragmented development space with different features and ideas existing in isolation on different forks. Due to this, it is hard to conduct research with TPOT.

We have decided to refactor the code base to make future research and development easier. Our main goals of the refactor are specifically to improve modularity, extendability, and maintainability. We want it to be easier for users to experiment with the algorithm and to contribute to the project.

Current Status of TPOT2

TPOT2 is in Alpha and mostly has feature parity with the original TPOT1.

Currently, the user-facing TPOTClassifier and TPOTRegressor classes are reasonably stable and unlikely to see many changes. One benefit of the simplified API is that we can update the algorithm under the hood without drastically changing the user experience.

We are still working on ensuring the backend meets our modularity, flexibility, maintainability, and extendability goals. There may be changes to the underlying code as we improve the software engineering (feedback is welcome!).

Differences between TPOT1 and TPOT2 - Porting your code

From the user's perspective, using TPOT1 and TPOT2 is very similar. We recommend you take a look at the TPOT2 Tutorials folder for Jupyter notebooks with examples.

Estimators
Both wrap the AutoML algorithm within a scikit-learn estimator, though the parameters may be slightly different, and we encourage users to read the documentation. TPOT1 provides the TPOTClassifier and TPOTRegressor classes. These are also present in TPOT2, though they have fewer parameters (for example, in TPOT2 the user does not need to provide the number of generations or population size). The goal for these classes in TPOT2 is to reduce the number of decisions and parameters to abstract away the evolutionary algorithm and simplify the experience for the users. Currently, TPOT2 just uses default values for the removed parameters, but in the future, we will look into potentially implementing a meta-learner similar to Auto-Sklearn. (If the user wants to manually tune all of the parameters, they are currently available in the TPOTEstimator class.) The configuration dictionaries also have a different structure to make them compatible with Optuna.

Results
The outputs in TPOT2 have been simplified to be more user-friendly. the fitted_pipeline_ attribute still points to the fitted pipeline chosen by the algorithm. The evaluated_individuals and pareto_front parameters now return an organized Pandas dataframe containing the all of the evaluated pipelines, their scores, and some other metadata.

Graphpipeline
The last major difference is that TPOT2 now supports graph-based pipelines. To do this, we implemented our own graph-estimator class that mirror the scikit-learn Pipeline class.

Bug fixes
TPOT1 has a bug in which it cannot terminate some pipelines after the time-out causing it to run endlessly #876 #645 #905 #508 #1214 #1200 #1107 #875 #797 #780
More flexible pipeline definitions allow FSS to be only included in leave nodes, preventing undefined behavior when they are set in inner nodes and without restricting to a linear pipeline shape. #1250
No more duplication as a result of stacking estimators #1242
better dask handling #779 #304
Other issues resolved
Better logs and attributes for accessing all evaluated individuals #1318 #1229 #982 #800 #780 #337
Parameter for encoding of categorical/ordinal columns #1237
support memory parameter in dask #1228 #961
TPOT2 can account for cases where the number of samples for a class < number of folds of CV #1220
more flexible pipeline search space definitions, preprocessing step #1190 #1182 #479
Support for custom, user-defined multi-objective functions, including a complexity function that tries to estimate number of learned parameters #1045 #783
resume TPOT2 run from checkpoint #977
stop at first condition met #504

Planned Features
meta learning #1254
covariate adjustment #1311 #1209
callbacks #678
better ensembling support #479 #105
better visualizations #337
better/custom initializations #59

What does this mean for TPOT1?

We will not be continuing to develop new features for TPOT1. We may fix minor bugs and dependency issues as they arise to maintain compatibility for continuing users. However, going forward, our primary focus will be on developing TPOT2.

You can find the TPOT2 repository here: https://github.com/EpistasisLab/tpot2/tree/main

Thank you for your interest in TPOT

We would love any feedback from the community! Let us know what you would like to see. Feel free to open an issue on the new page if you have any questions, suggestions, contributions, or bugs to report.

perib pinned this issue Sep 21, 2023

perib mentioned this issue Sep 22, 2023

How can I be part of the project to develop new modules? #1323

Closed

charlesbluca mentioned this issue Jan 22, 2024

Unblock CI failures from scikit-learn 1.4.0, pandas 2.2.0 dask-contrib/dask-sql#1295

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TPOT2 and the future of TPOT development -- From the Devs #1322

TPOT2 and the future of TPOT development -- From the Devs #1322

perib commented Sep 21, 2023 •

edited

TPOT2 and the future of TPOT development -- From the Devs #1322

TPOT2 and the future of TPOT development -- From the Devs #1322

Comments

perib commented Sep 21, 2023 • edited

perib commented Sep 21, 2023 •

edited