MLBenchmarks.jl

This repo provides Julia based benchmarks for ML algo on tabular data. It was developed to support both NeuroTreeModels.jl and EvoTrees.jl projects.

Methodology

For each dataset and algo, the following methodology is followed:

Data is split in three parts: train, eval and test
A random grid of 16 hyper-parameters is generated
For each parameter configuration, a model is trained on train data until the evaluation metric tracked against the eval stops improving (early stopping)
The trained model is evaluated against the test data
The metric presented in below are the ones obtained on the test for the model that generated the best eval metric.

Datasets

The following selection of common tabular datasets is covered:

Year: min squared error regression
MSRank: ranking problem with min squared error regression
YahooRank: ranking problem with min squared error regression
Higgs: 2-level classification with logistic regression
Boston Housing: min squared error regression
Titanic: 2-level classification with logistic regression

Algorithms

Comparison is performed against the following algos (implementation in link) considered as state of the art on tabular data problems tasks:

Boston

model_type	train_time	mse	gini
neurotrees	12.8	18.9	0.947
evotrees	0.206	19.7	0.927
xgboost	0.0648	19.4	0.935
lightgbm	0.865	25.4	0.926
catboost	0.0511	13.9	0.946

Titanic

model_type	train_time	logloss	accuracy
neurotrees	7.58	0.407	0.828
evotrees	0.673	0.382	0.828
xgboost	0.0379	0.375	0.821
lightgbm	0.615	0.390	0.836
catboost	0.0326	0.388	0.836

Year

model_type	train_time	mse	gini
neurotrees	280.0	76.4	0.652
evotrees	18.6	80.1	0.627
xgboost	17.2	80.2	0.626
lightgbm	8.11	80.3	0.624
catboost	80.0	79.2	0.635

MSRank

model_type	train_time	mse	ndcg
neurotrees	39.1	0.578	0.462
evotrees	37.0	0.554	0.504
xgboost	12.5	0.554	0.503
lightgbm	37.5	0.553	0.503
catboost	15.1	0.558	0.497

Yahoo

model_type	train_time	mse	ndcg
neurotrees	417.0	0.584	0.781
evotrees	687.0	0.545	0.797
xgboost	120.0	0.547	0.798
lightgbm	244.0	0.540	0.796
catboost	161.0	0.561	0.794

Higgs

model_type	train_time	logloss	accuracy
neurotrees	12300.0	0.452	0.781
evotrees	2620.0	0.464	0.776
xgboost	1390.0	0.462	0.776
lightgbm	1330.0	0.461	0.779
catboost	7180.0	0.464	0.775

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
assets		assets
benchmarks		benchmarks
results		results
src		src
test		test
.gitignore		.gitignore
Manifest.toml		Manifest.toml
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

benchmarks

benchmarks

results

results

src

src

test

test

.gitignore

.gitignore

Manifest.toml

Manifest.toml

Project.toml

Project.toml

README.md

README.md

Repository files navigation

MLBenchmarks.jl

Methodology

Datasets

Algorithms

Boston

Titanic

Year

MSRank

Yahoo

Higgs

References

About

Releases

Packages

Languages

Evovest/MLBenchmarks.jl

Folders and files

Latest commit

History

Repository files navigation

MLBenchmarks.jl

Methodology

Datasets

Algorithms

Boston

Titanic

Year

MSRank

Yahoo

Higgs

References

About

Resources

Stars

Watchers

Forks

Languages