RideAnalytics ETL Pipeline of Uber Data Insights

Overview

RideAnalytics is an ETL (Extract, Transform, Load) pipeline designed to process and analyze Uber ride data, providing valuable insights into various aspects of the uber data. This project aims to demonstrate the power of data engineering and analytics in the transportation domain.

The pipeline extracts raw ride data, transforms it into a structured format, and loads it into a data store where it can be easily analyzed and visualized.

Features

Data Extraction: Utilizes the Uber API to fetch ride data.
Data Transformation: Cleans, aggregates, and enriches the raw data for analysis.
Data Loading: Stores the processed data in a database or data warehouse.
Insights Generation: Provides a set of predefined queries or scripts to extract valuable insights.
Visualization: Creates visual representations of the data for better understanding.

Architecture

Technology Used

Programming Language - Python
Google Cloud Platform
- Google Storage
- Compute Instance
- BigQuery
- Looker Studio

Modern Data Pipeine Tool - https://www.mage.ai/

Contibute to this open source project - https://github.com/mage-ai/mage-ai

Installation

Clone the repository:

git clone https://github.com/yourusername/RideAnalytics-ETL-Pipeline-for-Uber-Data-Insights.git

Navigate to the project directory:

cd RideAnalytics-ETL-Pipeline-for-Uber-Data-Insights

Data Model

Uber Dashboard

Dashboard

Data Pipeline and Dashboard Setup Guide

This guide will walk you through setting up a data pipeline and creating a dashboard using various tools, including Google Cloud Platform, Mage AI, BigQuery, and Lookerstudio.

Google Cloud Platform Setup

Create a new project and give it a name.
Cloud Storage Setup:
- Create a bucket.
- Upload the uber_data.csv file. Ensure you make the data public by granting edit access.
Compute Engine Setup:
- Create an instance.
- Connect to the instance using SSH.

Install Necessary Packages and Libraries:

sudo apt-get update
sudo apt-get install python3-distutils python3-apt wget
wget https://bootstrap.pypa.io/get-pip.py
sudo python3 get-pip.py
sudo pip3 install pandas google-cloud google-cloud-bigquery mage-ai

Create a Mage AI Project:
```
mage start demo_project
```
Note down the external IP address and port for later use.
Configure firewall rule to access Mage AI:
- Go to Firewall settings and create a new rule.
- Set source IPv4 range to your IP address or 0.0.0.0/0.
- Select TCP port and specify the port number used by Mage AI.

Setting Up Mage AI

Create a pipeline in Mage AI to organize your data processing tasks.
Run data_loader.py and input the URL of the uploaded dataset in Cloud Storage.
Run transformer.py to perform necessary data transformations.
Configure the io_config.yaml file for data_exporter.py as follows:
- Provide Google service account key details (obtainable from Google Cloud Console).
- Configure the dataset and table ID in BigQuery.

BigQuery Data Analysis

Access Google BigQuery and enter the SQL query provided in the BigQuery file.
Replace 'gothic-sled-395917.uber_data_engineering_yt.tbl_analytics' with your project name and dataset name in the SQL query.

Creating a Lookerstudio Dashboard

Navigate to Lookerstudio to begin creating your dashboard.
Create a new blank report and select BigQuery as the data source.
Utilize Lookerstudio's visualization tools to design and create a personalized Uber dashboard.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
BigQuery		BigQuery
dataset		dataset
images		images
mage-files		mage-files
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BigQuery

BigQuery

dataset

dataset

images

images

mage-files

mage-files

README.md

README.md

Repository files navigation

RideAnalytics ETL Pipeline of Uber Data Insights

Overview

Table of Contents

Features

Architecture

Technology Used

Installation

Data Model

Uber Dashboard

Data Pipeline and Dashboard Setup Guide

Google Cloud Platform Setup

Setting Up Mage AI

BigQuery Data Analysis

Creating a Lookerstudio Dashboard

About

Releases

Packages

Languages

SahilChowkekar/RideAnalytics-ETL-Pipeline-of-Uber-Data-Insights

Folders and files

Latest commit

History

Repository files navigation

RideAnalytics ETL Pipeline of Uber Data Insights

Overview

Table of Contents

Features

Architecture

Technology Used

Installation

Data Model

Uber Dashboard

Data Pipeline and Dashboard Setup Guide

Google Cloud Platform Setup

Setting Up Mage AI

BigQuery Data Analysis

Creating a Lookerstudio Dashboard

About

Topics

Resources

Stars

Watchers

Forks

Languages