Skip to content

ungvert/sdsj2018-automl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

10th place solution to Sberbank Data Science Journey 2018: AutoML

SDSJ AutoML — AutoML(automatic machine learning) competition aimed at development of machine learning systems for processing banking datasets: transactions, time-series as well as classic table data from real banking operations. Processing is handled automatically by the system with models selection, architecture, hyper-parameters, etc.

Solution description

Based on @tyz910 's public kernel

Preprocessing:

All preprocessing procedures prallelized.

  • If dataset's size bigger than 2Gb -> filtering columns with Boruta
  • Drop constant columns
  • Add is_na columns
  • Filling na values and downcasting
  • Extracting features from datetime columns
  • Target encoding for string columns

Training:

While we have time:

  1. Sample LightGBM hyperparameters and K-fold parameters(n_splits, shuffle)
  2. Construct folds as subset from main train LightGBM dataset
  3. Train all folds with LightGBM
  4. Minimize oof-score with HyperOpt
  5. Save all models

Local Validation

Public datasets for local validation: sdsj2018_automl_check_datasets.zip

Docker 🐳

docker pull ungvert/sdsj2018

About

SDSJ AutoML 10th place solution

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published