Skip to content

DaveDe/Auto_Machine_Learning

Repository files navigation

This is an automated machine learning application that inputs a dataset, performs preprocessing, and performs various algorithms,
with parameter tuning. While building this I learned that better Auto ML librarys exist, so I suggest looking into TPOT, 
Auto-sklearn, and ML BOX.

How to run:

python3 ML_library.py

Input File Format:

1st line is training file name
2nd line is test file name (should be same format as training file (Y isnt required). Leave blank if you dont have a test file)
3rd line asks whether you want to output the predictions of each algorithm
4th line is indicies of categorical features, seperated by a comma (leave blank if there are no categorical features)
5th line is a list of indicies of columns you want removed (leave blank if you don't want any removed)
6th line is a character or String that is in place of missing values (this will be replaced by the columns median.
    Leave blank if all values are filled, "" if the values are blank)
7th line asks whether you want to drop features that have high covariate drift
8th line takes the number of dimensions the data should be reduced to (via pca). Say 0 if you dont want PCA being run.
lines 9 and after are for algorithm selection

Example input file:

Training filename: train.csv
Test filename:
Output predictions (Y/N): N
nominal features: 2,3,4,5,6,7,8,9
remove column indicies (starting at 0): 0
missing character string:
drop columns with high covariate shift (Y/N): N
PCA (enter number of dimensions): 100
linear regression (Y/N): Y
polynomial regression (Y/N): N
ANN (Y/N): Y
Elastic Net (Y/N): Y
Random Forest (Y/N): Y
Gradient Boosting Machine (Y/N): Y
XGBoost (Y/N): N
LightGBM (Y/N): Y

Dataset Format:

- The first row must label all the columns, and Y must be one of the labels
- Test file should be same format as training file, except Y isnt needed

Examples of proper dataset format:

X0,X1,X2,Y
1,"a",4,0
7,"q",8,3
...

Volume,Y,Temperature
12,143,1
3,133,5
...

About

Automated Machine Learning tool

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages