RideAnalytics is an ETL (Extract, Transform, Load) pipeline designed to process and analyze Uber ride data, providing valuable insights into various aspects of the uber data. This project aims to demonstrate the power of data engineering and analytics in the transportation domain.
The pipeline extracts raw ride data, transforms it into a structured format, and loads it into a data store where it can be easily analyzed and visualized.
- Features
- Architecture
- Technology Used
- Installation
- Data Model
- Uber Dashboard
- Data Pipeline and Dashboard Setup Guide
- Data Extraction: Utilizes the Uber API to fetch ride data.
- Data Transformation: Cleans, aggregates, and enriches the raw data for analysis.
- Data Loading: Stores the processed data in a database or data warehouse.
- Insights Generation: Provides a set of predefined queries or scripts to extract valuable insights.
- Visualization: Creates visual representations of the data for better understanding.
- Programming Language - Python
- Google Cloud Platform
- Google Storage
- Compute Instance
- BigQuery
- Looker Studio
Modern Data Pipeine Tool - https://www.mage.ai/
Contibute to this open source project - https://github.com/mage-ai/mage-ai
- Clone the repository:
git clone https://github.com/yourusername/RideAnalytics-ETL-Pipeline-for-Uber-Data-Insights.git
- Navigate to the project directory:
cd RideAnalytics-ETL-Pipeline-for-Uber-Data-Insights
This guide will walk you through setting up a data pipeline and creating a dashboard using various tools, including Google Cloud Platform, Mage AI, BigQuery, and Lookerstudio.
-
Create a new project and give it a name.
-
Cloud Storage Setup:
- Create a bucket.
- Upload the
uber_data.csv
file. Ensure you make the data public by granting edit access.
-
Compute Engine Setup:
- Create an instance.
- Connect to the instance using SSH.
-
Install Necessary Packages and Libraries:
sudo apt-get update sudo apt-get install python3-distutils python3-apt wget wget https://bootstrap.pypa.io/get-pip.py sudo python3 get-pip.py sudo pip3 install pandas google-cloud google-cloud-bigquery mage-ai
-
Create a Mage AI Project:
mage start demo_project
Note down the external IP address and port for later use.
-
Configure firewall rule to access Mage AI:
- Go to Firewall settings and create a new rule.
- Set source IPv4 range to your IP address or 0.0.0.0/0.
- Select TCP port and specify the port number used by Mage AI.
-
Create a pipeline in Mage AI to organize your data processing tasks.
-
Run
data_loader.py
and input the URL of the uploaded dataset in Cloud Storage. -
Run
transformer.py
to perform necessary data transformations. -
Configure the
io_config.yaml
file fordata_exporter.py
as follows:- Provide Google service account key details (obtainable from Google Cloud Console).
- Configure the dataset and table ID in BigQuery.
-
Access Google BigQuery and enter the SQL query provided in the BigQuery file.
-
Replace
'gothic-sled-395917.uber_data_engineering_yt.tbl_analytics'
with your project name and dataset name in the SQL query.
-
Navigate to Lookerstudio to begin creating your dashboard.
-
Create a new blank report and select BigQuery as the data source.
-
Utilize Lookerstudio's visualization tools to design and create a personalized Uber dashboard.