Skip to content

Commit

Permalink
[WIP] 0.8.0 release notes (#3303)
Browse files Browse the repository at this point in the history
Co-authored-by: Alexander Shirkov <ashyrkou@amazon.com>
Co-authored-by: Zhiqiang Tang <zhiqiang.tang@rutgers.edu>
Co-authored-by: Oleksandr Shchur <oleks.shchur@gmail.com>
Co-authored-by: innixma <innixma@gmail.com>
  • Loading branch information
5 people committed Jun 15, 2023
1 parent 83fc949 commit b3af6a2
Showing 1 changed file with 133 additions and 0 deletions.
133 changes: 133 additions & 0 deletions docs/whats_new/v0.8.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# Version 0.8.0
We're happy to announce the AutoGluon 0.8 release.

Note: Loading models trained in different versions of AutoGluon is not supported.

This release contains 196 commits from 20 contributors!

See the full commit change-log here: https://github.com/autogluon/autogluon/compare/0.7.0...0.8.0

Special thanks to @geoalgo for the joint work in generating the experimental tabular Zeroshot-HPO portfolio this release!

Full Contributor List (ordered by # of commits):

@shchur, @Innixma, @yinweisu, @gradientsky, @FANGAreNotGnu, @zhiqiangdon, @gidler, @liangfu, @tonyhoo, @cheungdaven, @cnpgs, @giswqs, @suzhoum, @yongxinw, @isunli, @jjaeyeon, @xiaochenbin9527, @yzhliu, @jsharpna, @sxjscience

AutoGluon 0.8 supports Python versions 3.8, 3.9, and 3.10.

# Changes

## Highlights
* AutoGluon TimeSeries introduced several major improvements, including new models, upgraded presets that lead to better forecast accuracy, and optimizations that speed up training & inference.
* AutoGluon Tabular now supports **[calibrating the decision threshold in binary classification](https://auto.gluon.ai/stable/tutorials/tabular/tabular-indepth.html#decision-threshold-calibration)** ([API](https://auto.gluon.ai/stable/api/autogluon.tabular.TabularPredictor.calibrate_decision_threshold.html)), leading to massive improvements in metrics such as `f1` and `balanced_accuracy`. It is not uncommon to see `f1` scores improve from `0.70` to `0.73` as an example. We **strongly** encourage all users who are using these metrics to try out the new decision threshold calibration logic.
* AutoGluon MultiModal introduces two new features: 1) **PDF document classification**, and 2) **Open Vocabulary Object Detection**.
* AutoGluon MultiModal upgraded the presets for object detection, now offering `medium_quality`, `high_quality`, and `best_quality` options. The empirical results demonstrate significant ~20% relative improvements in the mAP (mean Average Precision) metric, using the same preset.
* AutoGluon Tabular has added an experimental **Zeroshot HPO config** which performs well on small datasets <10000 rows when at least an hour of training time is provided (~60% win-rate vs `best_quality`). To try it out, specify `presets="experimental_zeroshot_hpo_hybrid"` when calling `fit()`.
* AutoGluon EDA added support for **Anomaly Detection** and **Partial Dependence Plots**.
* AutoGluon Tabular has added experimental support for **[TabPFN](https://github.com/automl/TabPFN)**, a pre-trained tabular transformer model. Try it out via `pip install autogluon.tabular[all,tabpfn]` (hyperparameter key is "TABPFN")!

## General
* General doc improvements @tonyhoo @Innixma @yinweisu @gidler @cnpgs @isunli @giswqs (#2940, #2953, #2963, #3007, #3027, #3059, #3068, #3083, #3128, #3129, #3130, #3147, #3174, #3187, #3256, #3258, #3280, #3306, #3307, #3311, #3313)
* General code fixes and improvements @yinweisu @Innixma (#2921, #3078, #3113, #3140, #3206)
* CI improvements @yinweisu @gidler @yzhliu @liangfu @gradientsky (#2965, #3008, #3013, #3020, #3046, #3053, #3108, #3135, #3159, #3283, #3185)
* New AutoGluon Webpage @gidler @shchur (#2924)
* Support sample_weight in RMSE @jjaeyeon (#3052)
* Move AG search space to common @yinweisu (#3192)
* Deprecation utils @yinweisu (#3206, #3209)
* Update namespace packages for PEP420 compatibility @gradientsky (#3228)

## Multimodal

AutoGluon MultiModal (also known as AutoMM) introduces two new features: 1) PDF document classification, and 2) Open Vocabulary Object Detection. Additionally, we have upgraded the presets for object detection, now offering `medium_quality`, `high_quality`, and `best_quality` options. The empirical results demonstrate significant ~20% relative improvements in the mAP (mean Average Precision) metric, using the same preset.

### New Features
* PDF Document Classification. See [tutorial](https://auto.gluon.ai/stable/tutorials/multimodal/document/pdf_classification.html) @cheungdaven (#2864, #3043)
* Open Vocabulary Object Detection. See [tutorial](https://auto.gluon.ai/stable/tutorials/multimodal/object_detection/quick_start/quick_start_ovd.html) @FANGAreNotGnu (#3164)

### Performance Improvements
* Upgrade the detection engine from mmdet 2.x to mmdet 3.x, and upgrade our presets @FANGAreNotGnu (#3262)
* `medium_quality`: yolo-s -> yolox-l
* `high_quality`: yolox-l -> DINO-Res50
* `best_quality`: yolox-x -> DINO-Swin_l
* Speedup fusion model training with deepspeed strategy. @liangfu (#2932)
* Enable detection backbone freezing to boost finetuning speed and save GPU usage @FANGAreNotGnu (#3220)

### Other Enhancements
* Support passing data path to the fit() API @zhiqiangdon (#3006)
* Upgrade TIMM to the latest v0.9.* @zhiqiangdon (#3282)
* Support xywh output for object detection @FANGAreNotGnu (#2948)
* Fusion model inference acceleration with TensorRT @liangfu (#2836, #2987)
* Support customizing advanced image data augmentation. Users can pass a list of [torchvision transform](https://pytorch.org/vision/stable/transforms.html#geometry) objects as image augmentation. @zhiqiangdon (#3022)
* Add yoloxm and yoloxtiny @FangAreNotGnu (#3038)
* Add MultiImageMix Dataset for Object Detection @FangAreNotGnu (#3094)
* Support loading specific checkpoints. Users can load the intermediate checkpoints other than model.ckpt and last.ckpt. @zhiqiangdon (#3244)
* Add some predictor properties for model statistics @zhiqiangdon (#3289)
* `trainable_parameters` returns the number of trainable parameters.
* `total_parameters` returns the number of total parameters.
* `model_size` returns the model size measured by megabytes.

### Bug Fixes / Code and Doc Improvements
* General bug fixes and improvements @zhiqiangdon @liangfu @cheungdaven @xiaochenbin9527 @Innixma @FANGAreNotGnu @gradientsky @yinweisu @yongxinw (#2939, #2989, #2983, #2998, #3001, #3004, #3006, #3025, #3026, #3048, #3055, #3064, #3070, #3081, #3090, #3103, #3106, #3119, #3155, #3158, #3167, #3180, #3188, #3222, #3261, #3266, #3277, #3279, #3261, #3267)
* General doc improvements @suzhoum (#3295, #3300)
* Remove clip from fusion models @liangfu (#2946)
* Refactor inferring problem type and output shape @zhiqiangdon (#3227)
* Log GPU info including GPU total memory, free memory, GPU card name, and CUDA version during training @zhiqaingdon (#3291)


## Tabular

### New Features
* Added `calibrate_decision_threshold` ([tutorial](https://auto.gluon.ai/stable/tutorials/tabular/tabular-indepth.html#decision-threshold-calibration)), which allows to optimize a given metric's decision threshold for predictions to strongly enhance the metric score. @Innixma (#3298)
* We've added an experimental Zeroshot HPO config, which performs well on small datasets <10000 rows when at least an hour of training time is provided. To try it out, specify `presets="experimental_zeroshot_hpo_hybrid"` when calling `fit()` @Innixma @geoalgo (#3312)
* The [TabPFN model](https://auto.gluon.ai/stable/api/autogluon.tabular.models.html#tabpfnmodel) is now supported as an experimental model. TabPFN is a viable model option when inference speed is not a concern, and the number of rows of training data is less than 10,000. Try it out via `pip install autogluon.tabular[all,tabpfn]`! @Innixma (#3270)
* Backend support for distributed training, which will be available with the next Cloud module release. @yinweisu (#3054, #3110, #3115, #3131, #3142, #3179, #3216)
### Performance Improvements
* Accelerate boolean preprocessing @Innixma (#2944)
### Other Enhancements
* Add quantile regression support for CatBoost @shchur (#3165)
* Implement quantile regression for LGBModel @shchur (#3168)
* Log to file support @yinweisu (#3232)
* Add support for `included_model_types` @yinweisu (#3239)
* Add enable_categorical=True support to XGBoost @Innixma (#3286)
### Bug Fixes / Code and Doc Improvements
* Cross-OS loading of a fit TabularPredictor should now work properly @yinweisu @Innixma
* General bug fixes and improvements @Innixma @cnpgs @shchur @yinweisu @gradientsky (#2865, #2936, #2990, #3045, #3060, #3069, #3148, #3182, #3199, #3226, #3257, #3259, #3268, #3269, #3287, #3288, #3285, #3293, #3294, #3302)
* Move interpretable logic to InterpretableTabularPredictor @Innixma (#2981)
* Enhance drop_duplicates, enable by default @Innixma (#3010)
* Refactor params_aux & memory checks @Innixma (#3033)
* Raise regression `pred_proba` @Innixma (#3240)


## TimeSeries
In v0.8 we introduce several major improvements to the Time Series module, including new models, upgraded presets that lead to better forecast accuracy, and optimizations that speed up training & inference.

### Highlights
- New models: `PatchTST` and `DLinear` from GluonTS, and `RecursiveTabular` based on integration with the [`mlforecast`](https://github.com/Nixtla/mlforecast) library @shchur (#3177, #3184, #3230)
- Improved accuracy and reduced overall training time thanks to updated presets @shchur (#3281, #3120)
- 3-6x faster training and inference for `AutoARIMA`, `AutoETS`, `Theta`, `DirectTabular`, `WeightedEnsemble` models @shchur (#3062, #3214, #3252)

### New Features
- Dramatically faster repeated calls to `predict()`, `leaderboard()` and `evaluate()` thanks to prediction caching @shchur (#3237)
- Reduce overfitting by using multiple validation windows with the `num_val_windows` argument to `fit()` @shchur (#3080)
- Exclude certain models from presets with the `excluded_model_types` argument to `fit()` @shchur (#3231)
- New method `refit_full()` that refits models on combined train and validation data @shchur (#3157)
- Train multiple configurations of the same model by providing lists in the `hyperparameters` argument @shchur (#3183)
- Time limit set by `time_limit` is now respected by all models @shchur (#3214)

### Enhancements
- Improvements to the `DirectTabular` model (previously called `AutoGluonTabular`): faster featurization, trained as a quantile regression model if `eval_metric` is set to `"mean_wQuantileLoss"` @shchur (#2973, #3211)
- Use correct seasonal period when computing the MASE metric @shchur (#2970)
- Check the AutoGluon version when loading `TimeSeriesPredictor` from disk @shchur (#3233)

### Minor Improvements / Documentation / Bug Fixes
* Update documentation and tutorials @shchur (#2960, #2964, #3296, #3297)
* General bug fixes and improvements @shchur (#2977, #3058, #3066, #3160, #3193, #3202, #3236, #3255, #3275, #3290)

## Exploratory Data Analysis (EDA) tools
In 0.8 we introduce a few new tools to help with data exploration and feature engineering:
* **Anomaly Detection** @gradientsky (#3124, #3137) - helps to identify unusual patterns or behaviors in data that deviate significantly from the norm. It's best used when finding outliers, rare events, or suspicious activities that could indicate fraud, defects, or system failures. Check the [Anomaly Detection Tutorial](https://auto.gluon.ai/stable/tutorials/eda/eda-auto-anomaly-detection.html) to explore the functionality.
* **Partial Dependence Plots** @gradientsky (#3071, #3079) - visualize the relationship between a feature and the model's output for each individual instance in the dataset. Two-way variant can visualize potential interactions between any two features. Please see this tutorial for more detail: [Using Interaction Charts To Learn Information About the Data](https://auto.gluon.ai/stable/tutorials/eda/eda-auto-analyze-interaction.html#using-interaction-charts-to-learn-information-about-the-data)
### Bug Fixes / Code and Doc Improvements
* Switch regression analysis in `quick_fit` to use residuals plot @gradientsky (#3039)
* Added `explain_rows` method to `autogluon.eda.auto` - Kernel SHAP visualization @gradientsky (#3014)
* General improvements and fixes @gradientsky (#2991, #3056, #3102, #3107, #3138)

0 comments on commit b3af6a2

Please sign in to comment.