Skip to content

Azure-based solution for ingesting and analyzing Formula 1 data using Azure Data Lake Storage Gen2 and Databricks

Notifications You must be signed in to change notification settings

salineroa/formula-1-racing-data-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Formula 1 Racing Data Pipeline 🏎️

Summary

In this project, we used the Azure Cloud Services to design and orchestrate a data pipeline to perform data engineering operations (Ingestion, Transformation, Analysis, Load) on a Formula 1 Racing Dataset

Data Source

The data for all the Formula 1 races from 1950s onwards is obtained from an open source API called Ergast Developer API. The structure of the database is shown in the following ER Diagram and explained in the Database User Guide

ERDiagram

Tools

Architecture

The solution used in this project is based on the "Modern analytics architecture with Azure Databricks" from the Azure Architecture Center:

Project Structure

  1. data - contains sample raw data from Ergast API.
  2. set-up - notebooks to mount ADLS storages (raw, ingested, presentaton) in Databricks.
  3. raw - contains SQL file to create ingested tables using Spark SQL.
  4. ingestion - contains notebooks to ingest all the data files from raw layer to ingested layer. Handles the incremental data for files results, pitstopes, laptimes and qualifying.
  5. trans - contains notebooks to transform the data from ingested layer to presentation layer. Notebook performs transformations to setup for analysis.
  6. analysis - contains SQL files for finding the dominant drivers and teams and to prepare the results for visualization.
  7. includes - includes notebooks containing helper functions used in transformations.
  8. utils - contains SQL file to drop all databases for incremental load.

About

Azure-based solution for ingesting and analyzing Formula 1 data using Azure Data Lake Storage Gen2 and Databricks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages