Skip to content

San Diego COVID-19 Dashboard on Kubernetes in the Google Cloud

License

Notifications You must be signed in to change notification settings

flrs/sd_covid_dashboard

Repository files navigation

San Diego COVID-19 Dashboard

A self-updating data application on Kubernetes in the Cloud

“How are COVID-19 cases evolving in my San Diego neighborhood?”

Screen recording of dashboard

This project primarily serves as an example of how to deploy a self-updating data application on Kubernetes in the cloud, using automated continuous integration workflows. Secondarily, the project offers a refined view on San Diego County COVID-19 data through a dashboard that focuses on breaking down case data by ZIP code.

Dashboard URL: http://35.225.51.112

Table of Contents

Pipeline Status

This section tracks the status of the various production continuous integration pipelines.

Cloud Foundation Layer

Deploy Infrastructure on GCP

Kubernetes Foundation Layer

Deploy Infrastructure on GKE

App Layer

Docker Build Built Image K8s Deployment
Crawler Build Docker Crawler Built Docker Crawler Image Deploy Infrastructure on GKE
Dashboard Build Docker Dashboard Built Docker Dashboard Image Deploy Infrastructure on GKE

Infrastructure

This section describes the project's infrastructure. The infrastructure is built with the following requirements in mind:

  • Have a module to automatically download new data (crawler)
  • Have a module to store new data (database)
  • Have a module to display analyses of those data (dashboard)
  • Be small-scale, since very little traffic is expected
  • Be relatively affordable to run
  • Be easy to update and maintain, so experiments can be run at low cost

As a direct result of these requirements, the project features a great amount of automation. Since the prime use case for cloud computing is big-scale, and not of small-scale deployments (and may come with a bill to match), the project uses a custom cloud infrastructure that minimizes cost.

The below chart gives an overview of the repository's components:

.github/ <-- Continuous integration pipelines
crawler/ <-- Component: Crawler for getting COVID-19 case data
dashboard/ <-- Component: Dashboard for displaying case data 
docs/ <-- Documentation
k8s/ <-- Kubernetes manifests
  base/ <-- Common manifests for dev and production stages
  overlays/
    dev/ <-- Manifests for development stage (local)
    prod/ <-- Manifests for production stage (in cloud)
notebooks/ <-- Jupyter notebooks for exprimenting with data
terraform/ <-- Terraform configurations for Kubernetes and 
               database deployment on Google Cloud

Continuous Deployment Pipelines

A good starting point for understanding how this repository works are the continuous deployment pipelines. They connect development to deployment. All continuous deployment pipelines are implemented as GitHub Actions workflows. Here are the typical use cases the pipelines cover:

  • Deploy Infrastructure on GCP (GCP: Google Cloud Platform) At the beginning, your Google cloud account includes no resources related to this project whatsoever. The deploy_infrastructure_gcp workflow uses Terraform to provision a Kubernetes cluster with a pre-defined set of nodes, a Postgres database, and related roles and permissions on your Google cloud account. These components are the foundation this app needs to run in the cloud.

  • Deploy App on GKE (GKE: Google Kubernetes Engine) The deploy_app_gke workflow deploys all components of the app into the Kubernetes cluster created on GCP.

    The workflow also takes care of updating the Kubernetes deployment when any of the components change. For example, the workflow will rollout dashboard updates on Kubernetes when a new Docker image for the dashboard is available.

  • Build Docker Crawler/Build Docker Dashboard The build_docker_crawler and build_docker_dashboard workflows build the crawler and dashboard Docker images and push them onto DockerHub. The workflows are triggered whenever resources belonging to one of these app components, located either in the crawler or dashboard directories, are changed.

  • Destroy Infrastructure on GCP The manually-triggered destroy_infrastructure_gcp action uses Terraform to remove all resources created on Google cloud. This is useful during development or when the project has reached its end of life.

Cloud Foundation Layer: Google Cloud + Terraform

Terraform creates several resources on Google Cloud to make this project work. The terraform templates are based on Niko Kosonen's tutorial and the associated gke-tutorial GitHub repository.

Niko's terraform templates help to build a small-scale Kubernetes cluster on Google Cloud that circumvents Google's default load balancer. For very small clusters, the default load balancer often adds significant cost. Niko's solution is much less costly and its performance aligns better with the low performance demands of a small cluster. The next section describes the Kubernetes infrastructure in more detail.

For this project, Niko's terraform templates have been extended with a Google Cloud SQL Postgres database and some output variables to ease continuous deployment. The deploy_infrastructure_gcp workflow uses the Terraform outputs to write, encrypt and commit secrets to this repo, so that they can be picked up in the deploy_app_gke workflow.

Kubernetes Foundation Layer

After the cloud foundation has been laid by terraform, the Kubernetes infrastructure can now be built on top of it. Kubernetes manifests for production deployment are stored in the k8s/overlays/prod direct.

Once again, a good part of the Kubernetes infrastructure comes from Niko Kosonen's tutorial. The Kubernetes cluster has two nodes. One node serves as an ingress to the cluster and has a static IP that can be accessed by the public. Tying a static IP to the ingress node is the task of KubeIP. The other node is ephemeral and hosts all pods needed for running the apps. Traefik is deployed as an internal load balancer in the cluster and routes incoming traffic to the apps.

The app's crawler is triggered by a CronJob. Native Kubernetes CronJobs are not timezone-aware and run in the timezone of the hosting node, which is typically UTC. New COVID-19 data from San Diego County becomes available at a point in time relative to Pacific Time. Pacific Time observes Daylight Savings Time. As a result, the offset to UTC changes throughout the year. The Kubernetes-native CronJob does not support changing offsets. As an alternative, this project uses Hidde Beydals' timezone-aware CronJobber and deploys it alongside the other manifests.

The resulting Kubernetes deployment has 4 namespaces, default, kube-system, traefik, and dashboard.

default namespace traefik namespace
Kubernetes default namespace Kubernetes traefik namespace

Local Development

This repository also contains a Skaffold pipeline that helps to develop and deploy the project on a local machine. The k8s/overlays/dev directory contains the associated Kubernetes manifests.

App Layer

The app layer consists of 2 components: The crawler and the dashboard.

Both components use Kubernetes' dashboard namespace:

Kubernetes dashboard namespace

Crawler

The crawler's job is to acquire San Diego County's COVID-19 data and store it in the project's database, so that it can be received by the dashboard. The crawler is a implemented as Python script.

It is triggered by a timezone-aware CronJob as described above. In addition to this CronJob, the Kubernetes deployment also includes a job to seed the database with previously crawled data, saving traffic. This job is run once, at initialization of the app layer cluster.

Dashboard

The dashboard is the user-facing part of the application. It is implemented as a voila application and uses voila-vuetify to render a material UI-style user interface.

In comparison to other dashboarding tools, Voila is relatively slow. However, great user-facing performance is not a requirement for this project. Voila was chosen because it makes it easy to create working data applications fast, based on Jupyter Notebooks.

Contributing

For questions or feedback, please file an issue here on GitHub. You may also use the feedback form from within the app.

Feel free to open pull requests for any changes you think the project could benefit from.

Credits

License

This project is published under the MIT License.