Booking Demand Prediction

A solution to Grab's AI for SEA competition

Requirements

Environment

Linux (Prediction in 10 - Final Solution - Booking Demand Prediction.ipynb utilizes parallelism which doesn't work on Windows)

Language

Python 3.6+

Libraries

jupyter
pygeohash
simdkalman
sklearn
scipy==1.2
statsmodels
pystan
fbprophet
tensorflow==2.0.0-alpha0

Instructions

NOTE: You only need to run 10 - Final Solution - Booking Demand Prediction.ipynb. The other notebooks just explain the methodology behind the solution.

Install the necessary libraries

pip3 install -r requirements.txt

or:

pip3 install pygeohash simdkalman sklearn scipy==1.2 statsmodels pystan fbprophet tensorflow==2.0.0-alpha0 --upgrade

Note that you can also do it inside the jupyter notebooks by adding ! right before the commands:

!pip3 install pygeohash simdkalman sklearn scipy==1.2 statsmodels pystan fbprophet tensorflow==2.0.0-alpha0 --upgrade

Open the notebooks using jupyter

First, open jupyter using:

jupyter notebook

This will open a window in your browser. From there, open 10 - Final Solution - Booking Demand Prediction.

If you haven't installed jupyter yet, just enter the following to your commandline:

pip3 install jupyter

Running the cells

Focus on the desired cell to run and hold shift then enter. It is recommended to run the cells in order as presented.

And viola, done!

Methodology

On the use of hyperlocal forecasting

In hyperlocal forecasting, we need to fit a model for each timeseries (by location). On the other hand, we only need a single spatiotemporal model for all of them. So why use hyperlocal models? For better accuracy.

The following are the errors for the spatiotemporal models:

Models	RMSE
1st degree polynomial regressor	0.048594363494371254
2nd degree polynomial regressor	0.04912314854368942
3rd degree polynomial regressor	0.0495815584173599
Random Forest	0.07614693023190335
XGBoost	0.04855022108270493
Neural Network	0.049747

While the following are the errors for the hyperlocal models:

Models	RMSE
FBProphet	0.03973981575859182
Theta Method	0.035792962736291206
Kalman Filter	0.03900183400744959

For more information, please check out notebooks 4 to 8

On the models used

FBProphet

https://facebook.github.io/prophet/docs/quick_start.html

Kalman Filter

https://simdkalman.readthedocs.io/en/latest/

Theta Method

Please check out the journal articles in the journal folder. The researcher found the paper The Optimized Theta Model very enlightening.

On why SARIMA wasn't used

The period of the cycles are of length 96. This is way too large for SARIMA, making it slow.

On temporal anomaly detection

The researcher first suspected global temporal anomalies after observing the mean demand over time

An autoencoder was then trained on the density maps for each timestep. Fair enough, anomalies were found:

These anomalies happened on day 17 from 9:30am to 12:45pm. Perhaps there was an outage? A server malfunction? The researcher doesn't know.

To avoid these anomalies from affecting the results, days 1-19 were ignored while doing crossvalidation. Additionally, an additional regressor was added to the FBProphet model to handle these anomalies.

On the use of simple weighing for ensembling

Apparently, the mahalobis distance of the frequencies, demand mean, and time of the day of the timeseries doesn't affect the RMSEs that much. See the following plots (sorry for the misleading labels!):

RMSE vs. Mahalanobis Distance of the Frequencies

RMSE vs. Demand Mean

RMSE vs. Time of the Day

Thus, no external variables were used. For robustness, HuberRegression was used to determine the weights of the predictions.

Simple averaging also works well. In fact, it works 0.36% better than using the weights found using HuberRegression. The weights were used anyway since they are proportional to the models' performance.

Models	RMSE	Weight
FBProphet	0.03973981575859182	0.29395973
Theta Method	0.035792962736291206	0.38132017
Kalman Filter	0.03900183400744959	0.309658

For more information, please check out 9 - Ensembling.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
journals		journals
temporal_anomaly_detection		temporal_anomaly_detection
.gitignore		.gitignore
1 - Data Preprocessing.ipynb		1 - Data Preprocessing.ipynb
1.1 - Booking Demand Density Generation.ipynb		1.1 - Booking Demand Density Generation.ipynb
10 - Final Solution - Booking Demand Prediction.ipynb		10 - Final Solution - Booking Demand Prediction.ipynb
2 - EDA (Basics & FFT).ipynb		2 - EDA (Basics & FFT).ipynb
2.1 - EDA (Spatial Anomaly Detection).ipynb		2.1 - EDA (Spatial Anomaly Detection).ipynb
2.2 - EDA (Temporal Anomaly Detection).ipynb		2.2 - EDA (Temporal Anomaly Detection).ipynb
3 - Global Mean Forecasting.ipynb		3 - Global Mean Forecasting.ipynb
4 - Spatiotemporal Regression.ipynb		4 - Spatiotemporal Regression.ipynb
5 - Hyperlocal Forecasting with FBProphet.ipynb		5 - Hyperlocal Forecasting with FBProphet.ipynb
6 - Hyperlocal Forecasting with the Theta Method & FBProphet.ipynb		6 - Hyperlocal Forecasting with the Theta Method & FBProphet.ipynb
7 - Tests with the Kalman Filter.ipynb		7 - Tests with the Kalman Filter.ipynb
8 - Crossvalidation of the Hyperlocal Models.ipynb		8 - Crossvalidation of the Hyperlocal Models.ipynb
9 - Ensembling.ipynb		9 - Ensembling.ipynb
README.md		README.md
aiforsea_header.PNG		aiforsea_header.PNG
requirements.txt		requirements.txt
rmse_vs_demand_mean.png		rmse_vs_demand_mean.png
rmse_vs_mahalanobis.png		rmse_vs_mahalanobis.png
rmse_vs_time.png		rmse_vs_time.png
temporal_anomaly.PNG		temporal_anomaly.PNG
temporal_anomaly_2.PNG		temporal_anomaly_2.PNG

leloykun/booking-demand-prediction

Folders and files

Latest commit

History

Repository files navigation