-
Notifications
You must be signed in to change notification settings - Fork 1
DaveDe/Auto_Machine_Learning
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is an automated machine learning application that inputs a dataset, performs preprocessing, and performs various algorithms, with parameter tuning. While building this I learned that better Auto ML librarys exist, so I suggest looking into TPOT, Auto-sklearn, and ML BOX. How to run: python3 ML_library.py Input File Format: 1st line is training file name 2nd line is test file name (should be same format as training file (Y isnt required). Leave blank if you dont have a test file) 3rd line asks whether you want to output the predictions of each algorithm 4th line is indicies of categorical features, seperated by a comma (leave blank if there are no categorical features) 5th line is a list of indicies of columns you want removed (leave blank if you don't want any removed) 6th line is a character or String that is in place of missing values (this will be replaced by the columns median. Leave blank if all values are filled, "" if the values are blank) 7th line asks whether you want to drop features that have high covariate drift 8th line takes the number of dimensions the data should be reduced to (via pca). Say 0 if you dont want PCA being run. lines 9 and after are for algorithm selection Example input file: Training filename: train.csv Test filename: Output predictions (Y/N): N nominal features: 2,3,4,5,6,7,8,9 remove column indicies (starting at 0): 0 missing character string: drop columns with high covariate shift (Y/N): N PCA (enter number of dimensions): 100 linear regression (Y/N): Y polynomial regression (Y/N): N ANN (Y/N): Y Elastic Net (Y/N): Y Random Forest (Y/N): Y Gradient Boosting Machine (Y/N): Y XGBoost (Y/N): N LightGBM (Y/N): Y Dataset Format: - The first row must label all the columns, and Y must be one of the labels - Test file should be same format as training file, except Y isnt needed Examples of proper dataset format: X0,X1,X2,Y 1,"a",4,0 7,"q",8,3 ... Volume,Y,Temperature 12,143,1 3,133,5 ...
About
Automated Machine Learning tool
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published