Skip to content

pdawczak/how_much

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

How Much

./docs/assets/ruby_less_3_python.png

A Machine Learning supported project for predicting the income level class. It distinguishes between classes of “<= 50K” and “> 50K” in US Dollars. The model is trained using Adults income dataset and is integrated with really simple Rails application using Sklearn-porter project, that generated native Ruby code.

The demo application is hosted on Heroku and is available here (please be mindful if it takes a lot of time to load the first page - it’s hosted on free service).

Please note, the model’s accuracy is around 83% but is based on data gathered in 1994, so will not be very accurate for Today’s answers, nevertheless, it was great experience and fun project to build!

Train the Model

The full research, engineering and choosing features for the model, and then searching for the best model and parameters is described here, but the summary of findings are described below:

  1. Most correlated to the target variable features were:
    df.corr()["income_cat"].sort_values(ascending=False)
    # education            0.324409
    # hours-per-week       0.226346
    # capital-gain         0.219655
    # male                 0.205186
        
  2. These have been selected for training the models, which accuracy was:
    ModelBest accuracy
    Decision Tree82.0%
    Random Forest82.5%
    KNN82.1%
  3. The best performing model was based on the RandomForest algorithm, and this one will be deployed.

Deploy the Model

Sklearn-Porter is able to generate native Ruby code, which will be used to deploy the trained model:

from sklearn_porter import Porter

porter = Porter(grid_for_forest.best_estimator_, language='ruby')
output = porter.export(embed_data=True, class_name='Ml::IncomeClassifierModel')

with open('../app/lib/ml/income_classifier_model.rb', 'w') as f:
    f.write(output)

This would generate a class with the following interface:

class Ml::IncomeClassifierModel
  # ...

  def self.predict(features)
    # ...
  end

  # ...
end

That could be used as follows:

Ml::IncomeClassifierModel.predict([
  10, # value associated to education
  40, # value associated to hours_per_week
   0, # value associated to capital_gain
   1  # value indicating 1 for male or 0 for female
])
# => 0 or 1

Integrate with the rest of the application

The features required for performing the prediction are expected to be passed in a specific order and the predicted value will be either 0 or 1 (for “<= 50K” and “> 50K” respectively), so for convenience - these can be wrapped in another method.

Given the prediction will be performed based on values obtained from the form submitted by the user, the helper method can look like the following:

class Ml
  def self.classify(submission)
    submission.classified_as = predict([
      submission.education,
      submission.hours_per_week,
      submission.capital_gain,
      submission.male ? 1 : 0
    ])
  end

  def self.predict(features)
    classes = ["<= 50K", "> 50K"]
    predicted = Ml::IncomeClassifierModel.predict(features)
    classes[predicted]
  end
end

Summary

Even though the Machine Learning model has been trained on really old data (1994 was 25 years ago, when writing this post in 2019) and will most likely not be accurate for submissions of data reflecting Today’s circumstances - this still was great exercise and an amazing experience!

Now, when retrospectively considering where I spent most of my time when developing this simple app - I had to put more effort in building the frontend of the application, rather than coming up with the Machine Learning engine powering it…

./docs/assets/summary.jpeg

About

A Machine Learning supported project for predicting the income level class

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published