SDSJ AutoML — AutoML(automatic machine learning) competition aimed at development of machine learning systems for processing banking datasets: transactions, time-series as well as classic table data from real banking operations. Processing is handled automatically by the system with models selection, architecture, hyper-parameters, etc.
Based on @tyz910 's public kernel
All preprocessing procedures prallelized.
- If dataset's size bigger than 2Gb -> filtering columns with Boruta
- Drop constant columns
- Add is_na columns
- Filling na values and downcasting
- Extracting features from datetime columns
- Target encoding for
string
columns
While we have time:
- Sample LightGBM hyperparameters and K-fold parameters(
n_splits
,shuffle
) - Construct folds as subset from main train LightGBM dataset
- Train all folds with LightGBM
- Minimize oof-score with HyperOpt
- Save all models
Public datasets for local validation: sdsj2018_automl_check_datasets.zip
docker pull ungvert/sdsj2018