The pipeline takes data from the database (SQLite), prepares the data, makes predictions using the trained random forest model, and saves the result of the predictions to a new table.
This package requires:
-
modeling.py
works with the initial data provided by the link train_data_200k.csv. Script creates and saves a model of Random Forest regression aimed to predict the values of the Target_1...4 parameters using the values of Tag_1...79. -
pipeline.py
takes data from the database (SQLite test_data_100k), prepares the data, makes predictions using the trained model bymodeling.py
, and saves the result of the predictions, top 10 features by importance to a new table file.
Files train_data_200k.csv and test_data_100k should be within the working directory. To start the program, run:
pipeline.py
it imports modeling.py
which stores the model for further prediction. If the file with the model has already been generated, pipeline.py
will import it directly and make predictions.
The program might take several minutes to perform modeling.
👤 Aleksei Krasnov
- Website: Ph.D. Aleksei Krasnov
- Twitter: @AlekseiKrasnov4
- Github: alexey-krasnov
- LinkedIn: Aleksei Krasnov
Contributions, issues, and feature requests are welcome!
Feel free to check issues page.