Skip to content

v0.8.0

Compare
Choose a tag to compare
@yinweisu yinweisu released this 16 Jun 02:00
· 422 commits to master since this release
b3af6a2

Version 0.8.0

We're happy to announce the AutoGluon 0.8 release.

NEW: Join our official community discord server to ask questions and get involved!

Note: Loading models trained in different versions of AutoGluon is not supported.

This release contains 196 commits from 20 contributors!

See the full commit change-log here: 0.7.0...0.8.0

Special thanks to @geoalgo for the joint work in generating the experimental tabular Zeroshot-HPO portfolio this release!

Full Contributor List (ordered by # of commits):

@shchur, @Innixma, @yinweisu, @gradientsky, @FANGAreNotGnu, @zhiqiangdon, @gidler, @liangfu, @tonyhoo, @cheungdaven, @cnpgs, @giswqs, @suzhoum, @yongxinw, @isunli, @jjaeyeon, @xiaochenbin9527, @yzhliu, @jsharpna, @sxjscience

AutoGluon 0.8 supports Python versions 3.8, 3.9, and 3.10.

Changes

Highlights

  • AutoGluon TimeSeries introduced several major improvements, including new models, upgraded presets that lead to better forecast accuracy, and optimizations that speed up training & inference.
  • AutoGluon Tabular now supports calibrating the decision threshold in binary classification (API), leading to massive improvements in metrics such as f1 and balanced_accuracy. It is not uncommon to see f1 scores improve from 0.70 to 0.73 as an example. We strongly encourage all users who are using these metrics to try out the new decision threshold calibration logic.
  • AutoGluon MultiModal introduces two new features: 1) PDF document classification, and 2) Open Vocabulary Object Detection.
  • AutoGluon MultiModal upgraded the presets for object detection, now offering medium_quality, high_quality, and best_quality options. The empirical results demonstrate significant ~20% relative improvements in the mAP (mean Average Precision) metric, using the same preset.
  • AutoGluon Tabular has added an experimental Zeroshot HPO config which performs well on small datasets <10000 rows when at least an hour of training time is provided (~60% win-rate vs best_quality). To try it out, specify presets="experimental_zeroshot_hpo_hybrid" when calling fit().
  • AutoGluon EDA added support for Anomaly Detection and Partial Dependence Plots.
  • AutoGluon Tabular has added experimental support for TabPFN, a pre-trained tabular transformer model. Try it out via pip install autogluon.tabular[all,tabpfn] (hyperparameter key is "TABPFN")! You can also try it out via specifying presets="experimental_extreme_quality".

General

Multimodal

AutoGluon MultiModal (also known as AutoMM) introduces two new features: 1) PDF document classification, and 2) Open Vocabulary Object Detection. Additionally, we have upgraded the presets for object detection, now offering medium_quality, high_quality, and best_quality options. The empirical results demonstrate significant ~20% relative improvements in the mAP (mean Average Precision) metric, using the same preset.

New Features

Performance Improvements

  • Upgrade the detection engine from mmdet 2.x to mmdet 3.x, and upgrade our presets @FANGAreNotGnu (#3262)
    • medium_quality: yolo-s -> yolox-l
    • high_quality: yolox-l -> DINO-Res50
    • best_quality: yolox-x -> DINO-Swin_l
  • Speedup fusion model training with deepspeed strategy. @liangfu (#2932)
  • Enable detection backbone freezing to boost finetuning speed and save GPU usage @FANGAreNotGnu (#3220)

Other Enhancements

  • Support passing data path to the fit() API @zhiqiangdon (#3006)
  • Upgrade TIMM to the latest v0.9.* @zhiqiangdon (#3282)
  • Support xywh output for object detection @FANGAreNotGnu (#2948)
  • Fusion model inference acceleration with TensorRT @liangfu (#2836, #2987)
  • Support customizing advanced image data augmentation. Users can pass a list of torchvision transform objects as image augmentation. @zhiqiangdon (#3022)
  • Add yoloxm and yoloxtiny @FANGAreNotGnu (#3038)
  • Add MultiImageMix Dataset for Object Detection @FANGAreNotGnu (#3094)
  • Support loading specific checkpoints. Users can load the intermediate checkpoints other than model.ckpt and last.ckpt. @zhiqiangdon (#3244)
  • Add some predictor properties for model statistics @zhiqiangdon (#3289)
    • trainable_parameters returns the number of trainable parameters.
    • total_parameters returns the number of total parameters.
    • model_size returns the model size measured by megabytes.

Bug Fixes / Code and Doc Improvements

Tabular

New Features

  • Added calibrate_decision_threshold (tutorial), which allows to optimize a given metric's decision threshold for predictions to strongly enhance the metric score. @Innixma (#3298)
  • We've added an experimental Zeroshot HPO config, which performs well on small datasets <10000 rows when at least an hour of training time is provided. To try it out, specify presets="experimental_zeroshot_hpo_hybrid" when calling fit() @Innixma @geoalgo (#3312)
  • The TabPFN model is now supported as an experimental model. TabPFN is a viable model option when inference speed is not a concern, and the number of rows of training data is less than 10,000. Try it out via pip install autogluon.tabular[all,tabpfn]! @Innixma (#3270)
  • Backend support for distributed training, which will be available with the next Cloud module release. @yinweisu (#3054, #3110, #3115, #3131, #3142, #3179, #3216)

Performance Improvements

Other Enhancements

Bug Fixes / Code and Doc Improvements

TimeSeries

In v0.8 we introduce several major improvements to the Time Series module, including new models, upgraded presets that lead to better forecast accuracy, and optimizations that speed up training & inference.

Highlights

  • New models: PatchTST and DLinear from GluonTS, and RecursiveTabular based on integration with the mlforecast library @shchur (#3177, #3184, #3230)
  • Improved accuracy and reduced overall training time thanks to updated presets @shchur (#3281, #3120)
  • 3-6x faster training and inference for AutoARIMA, AutoETS, Theta, DirectTabular, WeightedEnsemble models @shchur (#3062, #3214, #3252)

New Features

  • Dramatically faster repeated calls to predict(), leaderboard() and evaluate() thanks to prediction caching @shchur (#3237)
  • Reduce overfitting by using multiple validation windows with the num_val_windows argument to fit() @shchur (#3080)
  • Exclude certain models from presets with the excluded_model_types argument to fit() @shchur (#3231)
  • New method refit_full() that refits models on combined train and validation data @shchur (#3157)
  • Train multiple configurations of the same model by providing lists in the hyperparameters argument @shchur (#3183)
  • Time limit set by time_limit is now respected by all models @shchur (#3214)

Enhancements

  • Improvements to the DirectTabular model (previously called AutoGluonTabular): faster featurization, trained as a quantile regression model if eval_metric is set to "mean_wQuantileLoss" @shchur (#2973, #3211)
  • Use correct seasonal period when computing the MASE metric @shchur (#2970)
  • Check the AutoGluon version when loading TimeSeriesPredictor from disk @shchur (#3233)

Minor Improvements / Documentation / Bug Fixes

Exploratory Data Analysis (EDA) tools

In 0.8 we introduce a few new tools to help with data exploration and feature engineering:

  • Anomaly Detection @gradientsky (#3124, #3137) - helps to identify unusual patterns or behaviors in data that deviate significantly from the norm. It's best used when finding outliers, rare events, or suspicious activities that could indicate fraud, defects, or system failures. Check the Anomaly Detection Tutorial to explore the functionality.
  • Partial Dependence Plots @gradientsky (#3071, #3079) - visualize the relationship between a feature and the model's output for each individual instance in the dataset. Two-way variant can visualize potential interactions between any two features. Please see this tutorial for more detail: Using Interaction Charts To Learn Information About the Data

Bug Fixes / Code and Doc Improvements