Data Analytics Pipeline for Taxi Fare Prediction

The industry today relies heavily on data analytics to make predictions. These predictions lead to successful business models that incentivise heavily from machine learning. Popular taxi services such as Uber and Lyft provide their users with a prediction of taxi fare before the customer is mapped to a driver. We try to provide a similar solution using the open dataset provided by the NYC Taxi and Limousine Commision (NYC-TLC). The intention is to process voluminous data in streams from NYC-TLC’s public data repository and perform parallel feature engineering and deploy a prediction engine on top of it.

In this project we implemented a data analytics pipeline to process over 100 million records of NYC-TLC historical data from a public S3 repository and predicted taxi fares. We contributed to parallel data preprocessing on AWS EMR using PySpark and Pandas and added machine learning models on top of it. Also implemented a Flask web application as an interface for users to query (serving layer) the trained models.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
doc		doc
notebook		notebook
preprocessor		preprocessor
ui		ui
.gitignore		.gitignore
Data_Preprocessing.ipynb		Data_Preprocessing.ipynb
Final report - Taxi Fare prediction.pdf		Final report - Taxi Fare prediction.pdf
Model_Fitting.ipynb		Model_Fitting.ipynb
Presentation - Taxi Fare prediction.pdf		Presentation - Taxi Fare prediction.pdf
README.md		README.md
architecture.jpg		architecture.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc

doc

notebook

notebook

preprocessor

preprocessor

ui

ui

.gitignore

.gitignore

Data_Preprocessing.ipynb

Data_Preprocessing.ipynb

Final report - Taxi Fare prediction.pdf

Final report - Taxi Fare prediction.pdf

Model_Fitting.ipynb

Model_Fitting.ipynb

Presentation - Taxi Fare prediction.pdf

Presentation - Taxi Fare prediction.pdf

README.md

README.md

architecture.jpg

architecture.jpg

Repository files navigation

Data Analytics Pipeline for Taxi Fare Prediction

About

Releases

Packages

Languages

node3/taxi-fare-prediction

Folders and files

Latest commit

History

Repository files navigation

Data Analytics Pipeline for Taxi Fare Prediction

About

Resources

Stars

Watchers

Forks

Languages