Skip to content

Pipeline that takes data from the database (SQLite), prepares the data, makes predictions using the trained model, and saves the result of the predictions to a new table.

Notifications You must be signed in to change notification settings

alexey-krasnov/pipeline_with_random_forest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pipeline_with_random_forest

The pipeline takes data from the database (SQLite), prepares the data, makes predictions using the trained random forest model, and saves the result of the predictions to a new table.

Prerequisites

This package requires:

Description

  1. modeling.py works with the initial data provided by the link train_data_200k.csv. Script creates and saves a model of Random Forest regression aimed to predict the values of the Target_1...4 parameters using the values of Tag_1...79.

  2. pipeline.py takes data from the database (SQLite test_data_100k), prepares the data, makes predictions using the trained model by modeling.py, and saves the result of the predictions, top 10 features by importance to a new table file.

Usage

Files train_data_200k.csv and test_data_100k should be within the working directory. To start the program, run:

pipeline.py

it imports modeling.py which stores the model for further prediction. If the file with the model has already been generated, pipeline.py will import it directly and make predictions.

Note

The program might take several minutes to perform modeling.

Author

👤 Aleksei Krasnov

🤝 Contributing

Contributions, issues, and feature requests are welcome!
Feel free to check issues page.

About

Pipeline that takes data from the database (SQLite), prepares the data, makes predictions using the trained model, and saves the result of the predictions to a new table.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages