FPL Classic League Data Pipeline

Overview

This project utilises the FPL API endpoints to extract player and manager data for a specified Invitational Classic League. This data is then visualised and showcased on my Dash web app.

Provisison required GCP cloud infrastructure using Terraform.
Docker-compose is used to run a multi-container Dagster deployment which orchestrates the whole data pipeline.
A sensor checks if a gameweek has been completed - If successful, the data pipeline is initiated for the recently completed gameweek.
Data is extracted using the FPL API endpoints and loaded into a GCS bucket as Parquet files.
Parquet files are loaded into BigQuery Tables for ad-hoc analysis,scheduled queries and visualisation using Looker Studio (Optional).
Pre-defined queries are run on BigQuery tables and the contents stored once again in GCS to service Dash web app.

Check out the Wiki to find out how to get started yourself and dive deeper into the inner workings of the pipeline.

Data Flow Diagram

Pipeline functionality

Run terraform to deploy Google VM, and other GCP services and initiate docker deployment using a start-up script.
Four docker containers are created for the deployment of Dagster.
- Postgres - Used for Dagster storage such as event logs and sensor ticks.
- User code - Location of your code to enable easy redeployment of code changes.
- Dagster daemon - Long running process which runs the sensors and queues asset materialization.
- Dagit - Dagster's UI to manually inspect pipeline and run backfills.
Sensor runs every 12 hours checking if a new gameweek has been completed. If so, a new partition is added corresponding to the gameweek and the extraction is initiatied for the given partition.
Gameweek data such as player stats and manager picks are extracted,transformed and loaded into a GCS bucket.
Data is loaded from the bucket into three BigQuery tables:
- player_gameweek - Stats for all players in the Premier League for all completed gameweeks.
- manager_gameweek - Stats for all players picked by a manager in the specified invitational league for all completed gameweeks. Each row represents a player picked by a single manager in your specifed invitational league.
- manager_gameweek_performance - Stats for all managers in the specified invitational league for each gameweek such as points gained in a gameweek.
Analytical queries are run using the three tables and output is saved to a Google Storage bucket.
Data is showcased via a Dash web app.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
fpl_classic_pipeline		fpl_classic_pipeline
terraform		terraform
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile_dagster		Dockerfile_dagster
Dockerfile_user		Dockerfile_user
README.md		README.md
dagster.yaml		dagster.yaml
docker-compose.yaml		docker-compose.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
workspace.yaml		workspace.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

fpl_classic_pipeline

fpl_classic_pipeline

terraform

terraform

.gitignore

.gitignore

.pre-commit-config.yaml

.pre-commit-config.yaml

Dockerfile_dagster

Dockerfile_dagster

Dockerfile_user

Dockerfile_user

README.md

README.md

dagster.yaml

dagster.yaml

docker-compose.yaml

docker-compose.yaml

pyproject.toml

pyproject.toml

requirements.txt

requirements.txt

workspace.yaml

workspace.yaml

Repository files navigation

FPL Classic League Data Pipeline

Overview

Data Flow Diagram

Pipeline functionality

About

Languages

ajohnson5/fpl-classic-pipeline

Folders and files

Latest commit

History

Repository files navigation

FPL Classic League Data Pipeline

Overview

Data Flow Diagram

Pipeline functionality

About

Topics

Resources

Stars

Watchers

Forks

Languages