PalmSpringsBnB

A Machine Learning powered project that is able to suggest a list of prices (nightly price, weeknight price, weekend night price, weekly price and monthly price) based on information about the property:

The Machine Learning model has been trained on the Palm Springs Vacation Rentals dataset available on Kaggle.

This application uses PyCall Ruby gem that allows calling Python code from Ruby.

Unfortunately, due to requirements of how Python should be installed to allow accessing from Ruby, the way it is installed on Heroku, it wasn’t possible to deploy it with default settings.

For this technical restriction, I recorded a short demo of how it works, and it is available on Youtube here.

The whole source code of the application is available on Github and whole research is available in this Jupyter Notebook.

Training the Model

For training the model, the following features were selected (here it shows their correlation to the target ~”price”~ we want to predict):

df.corr()["price"].sort_values(ascending=False)
price          1.000000
numBed         0.447529
numBathroom    0.301408
numBedroom     0.284166
numPeople      0.241956

The model’s accuracy achieved was as follows:

Model TRAIN Accuracy TEST Accuracy

Decision Tree 80.04% 78.39%

Random Forest 80.04% 78.44%

LinearRegression 35.23% -

Ridge 35.23% -
Choosing the model.
The table shows accuracy achieved both on Train and Test sets (there are no results for LinearRegression and Ridge models for test sets because they were not performing well on the Train set).

As expected - the accuracy is a bit lower on the Test set (the data the model didn’t see before), but the difference to the accuracy achieved when training the model isn’t too big at that point so it doesn’t look like the models were prone to the overfitting.

We will productionise the RandomForest as it has a slightly better score.

Deploy the Model

Because PyCall allows calling Python code from Ruby like it was native to it, we can export the required pre-trained pipeline and estimator from Jupyter.

What are `pipeline` and `estimator`?

To allow Machine Learning models perform predictions, it is required the data is passed in a certain format. It’s not uncommon, that the shape of the data passed by the user will have to change before passing to the model to perform predictions.

Assuming the user created a new property with the following data:

numBed	numBathroom	numBedroom	numPeople	city
1	1	1	2	Palm Springs

It will have to be transformed to something like:

1

2

0

1

0

1

0

The rules of encoding are not important at the moment, because it’s the job of the pipeline that will know how to turn the first table in the second one.

It’s the last table that will be passed to the estimator (our Model) that will perform the predictions.

Exporting `pipeline` and `estimator`

# in Jupyter
import pickle

pickle.dump(index_based_pipeline, open('idx_pipeline.pickle', 'wb'))

pickle.dump(final_estimator, open('model.pickle', 'wb'))

Ruby class that will use the Python exported code

# app/lib/ml/estimator.rb
require "pycall/import"
include PyCall::Import

class Ml::Estimator
  def initialize
    @pipeline = initialize_pipeline   # 1.
    @estimator = initialize_estimator # 2.

    @all_periods = ["Monthly", "Nightly", "Weekend night", "Weekly", "Weeknight"]
  end

  def predict(num_bed, num_bathroom, num_bedroom, num_people, city) # 3.
    all_variants = @all_periods.map { |period|   # 4.
      [num_bed, num_bathroom, num_bedroom, num_people, city, period] }

    prepared = @pipeline.transform(all_variants) # 5.
    predicted = @estimator.predict(prepared)     # 6.
    @all_periods.each_with_index                 # 7.
      .each_with_object({}) { |(period, idx), result| 
        result[period] = predicted[idx]
      }
  end

  def initialize_pipeline                                 # 1.
    pyimport :pickle                                      #
    pipeline_pkl = open("./ML/idx_pipeline.pickle", "rb") #
    pickle.load(pipeline_pkl)                             #
  end                                                     #

  def initialize_estimator                                # 2.
    pyimport :pickle                                      #
    estimator_pkl = open("./ML/model.pickle", "rb")       #
    pickle.load(estimator_pkl)                            #
  end                                                     #
end

Initialises the pipeline - it will load and instantiate the Python code
Initialises the estimator - this is similar initialisation for the estimator
Defines a method that will accept Property parameters specified by the user
Prepares 5 variants - enriches the data by appending one of five possible time periods
Prepares data to the format that will be possible to perform estimations
Performs predictions
Turns predictions into the format that will be easier to handle by the caller. The result of this will be a hash that will look like:

{ "Monthly" => 123.45, 
  "Nightly" => 123.45,
  "Weekend night" => 123.45,
  "Weekly" => 123.45,
  "Weeknight" => 123.45 }

Ruby code that will allow cleaner integration with the rest of the application

The earlier code loads Python code and knows a lot of low-level details we don’t want to expose to the rest of the application, so let’s introduce another layer that will facilitate it - it will accept a Property, will extract the data and pass it for predictions, and assign results back to the Property:

# app/lib/ml.rb
class Ml
  def self.estimate_prices(property)
    predictions = 
      estimator.predict(
        property.number_of_beds,
        property.number_of_bathrooms,
        property.number_of_bedrooms,
        property.number_of_people,
        property.city
      )

    property.nightly_price = predictions["Nightly"]
    property.weeknight_price = predictions["Weeknight"]
    property.weekend_night_price = predictions["Weekend night"]
    property.weekly_price = predictions["Weekly"]
    property.monthly_price = predictions["Monthly"]

    property
  end

  def self.estimator
    @@estimator
  end

  def self.estimator=(estimator)
    @@estimator = estimator
  end
end

Initialising Ruby’s `Ml::Estimator`

When checking how long it takes to load the Python code from the files, it is quite a slow process. It takes relatively a lot of time and it wouldn’t be a good idea to add this time to every prediction we perform.

To address this issue, we could benefit from the Rails’ initialisation phase.

Let’s add this code:

# config/initializers/ml_initializer.rb
Ml.estimator = Ml::Estimator.new

This will ensure we will load the Python code only once when the Rails server (or any other command, eg. Rake task) is started. Without this additional overhead, all the further predictions will be rapid.

Summary

Learnings

PyCall allows great opportunity and much more flexibility for porting Python code to make it available to Ruby applications. As contrary to Sklearn-porter, it doesn’t have any limitations to what kind of Machine Learning algorithms it can support.

From my brief experience with PyCall I feel like: If you can export the code from Jupyter, you will be able to use it in Ruby app seems applicable. It is still important to have an understanding of how the particular model works internally to decide if PyCall will be a good choice.

In this example application, we exported a RandomForest - it is quite accurate, performant and the generated pickle code is not very big, but if you consider KNN, that internally uses the whole dataset to make predictions, and this dataset’s size is in Gigabytes or Petabytes - could you afford such a server to host your Ruby/Rails application? Would you accept the time it would take to start such an application when it loads this whole dataset to memory?

Further considerations

PyCall was easy to use, but the initial set up might be a bit tricky. I have two distributions of Python installed on my machine - one by .asdf version manager, and Anaconda. I had to set up my environment variables (like PYTHON) to point proper executables in order for it to work.

For this very reason, it wasn’t possible to deploy this simple app to Heroku with its default configuration. Maybe Docker would be a way to go forward?

Secondly, in this exercise I disregarded a lot of data from the original set to build model quickly and try porting the pipeline and estimator to use them in Ruby. Unfortunately, the model is not super accurate - around 78.44% accuracy achieved on the test set. I considered it good enough to productionise it and release, to find what are the technical problems with this approach, and most importantly - if this would even be of interest for future users, but it is not the model that could be considered final in the longer term.

Now, when it is (hypothetically) deployed and available to public use, we can get back to the research and continue improving it while monitoring if it is being used. How much does it matter if it has only 78.44% accuracy, but no one really wants to use it?

Ideas for improving the Model

The dataset has many more features which we didn’t consider during this exercise. We could spend more effort on extracting data from JSON columns (like property features, additional fees), or extracting some sentiment score from property description.

At the moment we have only one model that does all the predictions for all the available periods - maybe it would be a good idea to train separate models that would focus on the feature importance for a particular time periods?

It was fun

Analysing the data set was great fun, and this little exercise has proven, that it’s data cleaning and preparing, that is the most time-consuming part of building Machine Learning models (it might not sound exciting, but I actually enjoyed this part - I could understand the market a little bit better!).

Integrating ready model and seeing it being used from the application in such an automatic way that is completely transparent to the end user gives great satisfaction!

All of it learned and achieved in spare time was a little bit tiring. I guess I deserve some holidays in a nice place.

I’ve heard Palm Springs is quite nice :).

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
ML		ML
app		app
bin		bin
config		config
db		db
docs/assets		docs/assets
lib		lib
log		log
public		public
test		test
tmp		tmp
vendor		vendor
.gitignore		.gitignore
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
Procfile		Procfile
README.org		README.org
Rakefile		Rakefile
config.ru		config.ru
package.json		package.json
requirements.txt		requirements.txt

Model	TRAIN Accuracy	TEST Accuracy
Decision Tree	80.04%	78.39%
Random Forest	80.04%	78.44%
LinearRegression	35.23%	-
Ridge	35.23%	-

pdawczak/palm_springs_bnb

Folders and files

Latest commit

History

Repository files navigation

PalmSpringsBnB

Training the Model

Deploy the Model

What are pipeline and estimator?

Exporting pipeline and estimator

Ruby class that will use the Python exported code

Ruby code that will allow cleaner integration with the rest of the application

Initialising Ruby’s Ml::Estimator

Summary

Learnings

Further considerations

Ideas for improving the Model

It was fun

About

Topics

Resources

Stars

Watchers

Forks

Languages

What are `pipeline` and `estimator`?

Exporting `pipeline` and `estimator`

Initialising Ruby’s `Ml::Estimator`