Version 0.8.0

We're happy to announce the AutoGluon 0.8 release.

NEW: Join our official community discord server to ask questions and get involved!

Note: Loading models trained in different versions of AutoGluon is not supported.

This release contains 196 commits from 20 contributors!

See the full commit change-log here: 0.7.0...0.8.0

Special thanks to @geoalgo for the joint work in generating the experimental tabular Zeroshot-HPO portfolio this release!

Full Contributor List (ordered by # of commits):

@shchur, @Innixma, @yinweisu, @gradientsky, @FANGAreNotGnu, @zhiqiangdon, @gidler, @liangfu, @tonyhoo, @cheungdaven, @cnpgs, @giswqs, @suzhoum, @yongxinw, @isunli, @jjaeyeon, @xiaochenbin9527, @yzhliu, @jsharpna, @sxjscience

AutoGluon 0.8 supports Python versions 3.8, 3.9, and 3.10.

Changes

Highlights

AutoGluon TimeSeries introduced several major improvements, including new models, upgraded presets that lead to better forecast accuracy, and optimizations that speed up training & inference.
AutoGluon Tabular now supports calibrating the decision threshold in binary classification (API), leading to massive improvements in metrics such as f1 and balanced_accuracy. It is not uncommon to see f1 scores improve from 0.70 to 0.73 as an example. We strongly encourage all users who are using these metrics to try out the new decision threshold calibration logic.
AutoGluon MultiModal introduces two new features: 1) PDF document classification, and 2) Open Vocabulary Object Detection.
AutoGluon MultiModal upgraded the presets for object detection, now offering medium_quality, high_quality, and best_quality options. The empirical results demonstrate significant ~20% relative improvements in the mAP (mean Average Precision) metric, using the same preset.
AutoGluon Tabular has added an experimental Zeroshot HPO config which performs well on small datasets <10000 rows when at least an hour of training time is provided (~60% win-rate vs best_quality). To try it out, specify presets="experimental_zeroshot_hpo_hybrid" when calling fit().
AutoGluon EDA added support for Anomaly Detection and Partial Dependence Plots.
AutoGluon Tabular has added experimental support for TabPFN, a pre-trained tabular transformer model. Try it out via pip install autogluon.tabular[all,tabpfn] (hyperparameter key is "TABPFN")! You can also try it out via specifying presets="experimental_extreme_quality".

General

General doc improvements @tonyhoo @Innixma @yinweisu @gidler @cnpgs @isunli @giswqs (#2940, #2953, #2963, #3007, #3027, #3059, #3068, #3083, #3128, #3129, #3130, #3147, #3174, #3187, #3256, #3258, #3280, #3306, #3307, #3311, #3313)
General code fixes and improvements @yinweisu @Innixma (#2921, #3078, #3113, #3140, #3206)
CI improvements @yinweisu @gidler @yzhliu @liangfu @gradientsky (#2965, #3008, #3013, #3020, #3046, #3053, #3108, #3135, #3159, #3283, #3185)
New AutoGluon Webpage @gidler @shchur (#2924)
Support sample_weight in RMSE @jjaeyeon (#3052)
Move AG search space to common @yinweisu (#3192)
Deprecation utils @yinweisu (#3206, #3209)
Update namespace packages for PEP420 compatibility @gradientsky (#3228)

Multimodal

AutoGluon MultiModal (also known as AutoMM) introduces two new features: 1) PDF document classification, and 2) Open Vocabulary Object Detection. Additionally, we have upgraded the presets for object detection, now offering medium_quality, high_quality, and best_quality options. The empirical results demonstrate significant ~20% relative improvements in the mAP (mean Average Precision) metric, using the same preset.

New Features

PDF Document Classification. See tutorial @cheungdaven (#2864, #3043)
Open Vocabulary Object Detection. See tutorial @FANGAreNotGnu (#3164)

Performance Improvements

Upgrade the detection engine from mmdet 2.x to mmdet 3.x, and upgrade our presets @FANGAreNotGnu (#3262)
- medium_quality: yolo-s -> yolox-l
- high_quality: yolox-l -> DINO-Res50
- best_quality: yolox-x -> DINO-Swin_l
Speedup fusion model training with deepspeed strategy. @liangfu (#2932)
Enable detection backbone freezing to boost finetuning speed and save GPU usage @FANGAreNotGnu (#3220)

Other Enhancements

Support passing data path to the fit() API @zhiqiangdon (#3006)
Upgrade TIMM to the latest v0.9.* @zhiqiangdon (#3282)
Support xywh output for object detection @FANGAreNotGnu (#2948)
Fusion model inference acceleration with TensorRT @liangfu (#2836, #2987)
Support customizing advanced image data augmentation. Users can pass a list of torchvision transform objects as image augmentation. @zhiqiangdon (#3022)
Add yoloxm and yoloxtiny @FANGAreNotGnu (#3038)
Add MultiImageMix Dataset for Object Detection @FANGAreNotGnu (#3094)
Support loading specific checkpoints. Users can load the intermediate checkpoints other than model.ckpt and last.ckpt. @zhiqiangdon (#3244)
Add some predictor properties for model statistics @zhiqiangdon (#3289)
- trainable_parameters returns the number of trainable parameters.
- total_parameters returns the number of total parameters.
- model_size returns the model size measured by megabytes.

Bug Fixes / Code and Doc Improvements

General bug fixes and improvements @zhiqiangdon @liangfu @cheungdaven @xiaochenbin9527 @Innixma @FANGAreNotGnu @gradientsky @yinweisu @yongxinw (#2939, #2989, #2983, #2998, #3001, #3004, #3006, #3025, #3026, #3048, #3055, #3064, #3070, #3081, #3090, #3103, #3106, #3119, #3155, #3158, #3167, #3180, #3188, #3222, #3261, #3266, #3277, #3279, #3261, #3267)
General doc improvements @suzhoum (#3295, #3300)
Remove clip from fusion models @liangfu (#2946)
Refactor inferring problem type and output shape @zhiqiangdon (#3227)
Log GPU info including GPU total memory, free memory, GPU card name, and CUDA version during training @zhiqaingdon (#3291)

Tabular

New Features

Added calibrate_decision_threshold (tutorial), which allows to optimize a given metric's decision threshold for predictions to strongly enhance the metric score. @Innixma (#3298)
We've added an experimental Zeroshot HPO config, which performs well on small datasets <10000 rows when at least an hour of training time is provided. To try it out, specify presets="experimental_zeroshot_hpo_hybrid" when calling fit() @Innixma @geoalgo (#3312)
The TabPFN model is now supported as an experimental model. TabPFN is a viable model option when inference speed is not a concern, and the number of rows of training data is less than 10,000. Try it out via pip install autogluon.tabular[all,tabpfn]! @Innixma (#3270)
Backend support for distributed training, which will be available with the next Cloud module release. @yinweisu (#3054, #3110, #3115, #3131, #3142, #3179, #3216)

Performance Improvements

Accelerate boolean preprocessing @Innixma (#2944)

Other Enhancements

Add quantile regression support for CatBoost @shchur (#3165)
Implement quantile regression for LGBModel @shchur (#3168)
Log to file support @yinweisu (#3232)
Add support for included_model_types @yinweisu (#3239)
Add enable_categorical=True support to XGBoost @Innixma (#3286)

Bug Fixes / Code and Doc Improvements

Cross-OS loading of a fit TabularPredictor should now work properly @yinweisu @Innixma
General bug fixes and improvements @Innixma @cnpgs @shchur @yinweisu @gradientsky (#2865, #2936, #2990, #3045, #3060, #3069, #3148, #3182, #3199, #3226, #3257, #3259, #3268, #3269, #3287, #3288, #3285, #3293, #3294, #3302)
Move interpretable logic to InterpretableTabularPredictor @Innixma (#2981)
Enhance drop_duplicates, enable by default @Innixma (#3010)
Refactor params_aux & memory checks @Innixma (#3033)
Raise regression pred_proba @Innixma (#3240)

TimeSeries

In v0.8 we introduce several major improvements to the Time Series module, including new models, upgraded presets that lead to better forecast accuracy, and optimizations that speed up training & inference.

Highlights

New models: PatchTST and DLinear from GluonTS, and RecursiveTabular based on integration with the mlforecast library @shchur (#3177, #3184, #3230)
Improved accuracy and reduced overall training time thanks to updated presets @shchur (#3281, #3120)
3-6x faster training and inference for AutoARIMA, AutoETS, Theta, DirectTabular, WeightedEnsemble models @shchur (#3062, #3214, #3252)

New Features

Dramatically faster repeated calls to predict(), leaderboard() and evaluate() thanks to prediction caching @shchur (#3237)
Reduce overfitting by using multiple validation windows with the num_val_windows argument to fit() @shchur (#3080)
Exclude certain models from presets with the excluded_model_types argument to fit() @shchur (#3231)
New method refit_full() that refits models on combined train and validation data @shchur (#3157)
Train multiple configurations of the same model by providing lists in the hyperparameters argument @shchur (#3183)
Time limit set by time_limit is now respected by all models @shchur (#3214)

Enhancements

Improvements to the DirectTabular model (previously called AutoGluonTabular): faster featurization, trained as a quantile regression model if eval_metric is set to "mean_wQuantileLoss" @shchur (#2973, #3211)
Use correct seasonal period when computing the MASE metric @shchur (#2970)
Check the AutoGluon version when loading TimeSeriesPredictor from disk @shchur (#3233)

Minor Improvements / Documentation / Bug Fixes

Update documentation and tutorials @shchur (#2960, #2964, #3296, #3297)
General bug fixes and improvements @shchur (#2977, #3058, #3066, #3160, #3193, #3202, #3236, #3255, #3275, #3290)

Exploratory Data Analysis (EDA) tools

In 0.8 we introduce a few new tools to help with data exploration and feature engineering:

Anomaly Detection @gradientsky (#3124, #3137) - helps to identify unusual patterns or behaviors in data that deviate significantly from the norm. It's best used when finding outliers, rare events, or suspicious activities that could indicate fraud, defects, or system failures. Check the Anomaly Detection Tutorial to explore the functionality.
Partial Dependence Plots @gradientsky (#3071, #3079) - visualize the relationship between a feature and the model's output for each individual instance in the dataset. Two-way variant can visualize potential interactions between any two features. Please see this tutorial for more detail: Using Interaction Charts To Learn Information About the Data

Bug Fixes / Code and Doc Improvements

Switch regression analysis in quick_fit to use residuals plot @gradientsky (#3039)
Added explain_rows method to autogluon.eda.auto - Kernel SHAP visualization @gradientsky (#3014)
General improvements and fixes @gradientsky (#2991, #3056, #3102, #3107, #3138)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.8.0

Version 0.8.0

Changes

Highlights

General

Multimodal

New Features

Performance Improvements

Other Enhancements

Bug Fixes / Code and Doc Improvements

Tabular

New Features

Performance Improvements

Other Enhancements

Bug Fixes / Code and Doc Improvements

TimeSeries

Highlights

New Features

Enhancements

Minor Improvements / Documentation / Bug Fixes

Exploratory Data Analysis (EDA) tools

Bug Fixes / Code and Doc Improvements

Contributors