Skip to content

Understanding the Codalab Architecture

Adrien Pavão edited this page Jan 24, 2023 · 14 revisions

This page explains to organizers and developers the architecture of Codalab V1.5 and the third party software it relies upon. Organizers can limit themselves to reading the overview.

Codalab architecture

Overview

Codalab consist in a front-end and a number of back-end service (see figure). The front-end uses the Django framework and interfaces with the users via a web server. Django communicates with the backend services:

  • Database
  • Task broker (rabbit MQ), which dispatches code to be executed to a number of compute workers:
    • Site worker: executes tasks such as unzipping large files.
    • Default compute worker: available to everyone; executes participants' submissions and scoring programs.
    • Organizer-supplied compute workers: private workers.

Organizers can:

  • Create their own queue and attach workers i.e. have their own private "back-end". When you create a queue, you can re-direct the submissions of your competitions to compute workers of your choice. You can attach several workers to the same queue, but each competition has a single queue. You can attach multiple competitions to the same queue. You can share your queue with other organizers or keep it private.
  • Run an entire Codalab instance of their own with a preconfigured AMI or from scratch -- should not be necessary for most users, particularly because they can customize their front end WITHOUT running their own front-end server.

Compute workers

Internal and external compute workers can be linked to CodaLab competitions. The queues dispatch the jobs between the compute workers. Note that a queue can receive jobs (submissions) from several competitions, and can send them to several compute workers.

workers_scheme

Docker

We use docker to manage our local development and deploy our environments because it provides an increased level of reproducibility. It used to take hours to setup each piece of Codalab, but now it should be a bit more simple. It is still a bit tedious to have to setup Azure or S3 Storage, but this can be changed in the future.

Django

Django is the bread and butter of Codalab Competition's side. We use it to interact with our database, migrate the database state, and fire off our asynchronous tasks.

PostgreSQL

Our data storage. Not the ideal choice, but it's what we're currently using to store data.

Click here for a diagram of the architecture of the PostgreSQL database.

RabbitMQ

We use RabbitMQ as our task message broker. It is simple, resilient, comes with nice management tools to assist with debugging:

RabbitMQ Management

— RabbitMQ Management

Flower

— Flower

Celery

This is our task queue where we execute long running tasks, like:

  • Creating a competition
  • Evaluating a submission
  • Sending mass emails
  • Re-running all submissions in a phase
  • Schedule tasks

Also, by decoupling the worker from the project as much as we have, we can allow competition organizers to run their own celery workers to consume submissions. They don't need access to storage keys like before, because we pass around signed urls with access to read/write.

Celery and RabbitMQ basically replaced what the Service Bus provided in our project a few months ago (as of March 2017).

Nginx

A simple HTTP server to handle web requests. We can use this to cache static pages and handle huge influxes of traffic if we need to.

Storage

All competitions, submissions, input data, submission results, logos, etc. are stored remotely either on Amazon S3 or Microsoft Azure Storage.

Data is typically passed around via URLs with signatures instead of streaming the actual data. This makes it more simple for a competition organizer to, for example, download competition data without having to share Codalab's storage keys.

This is how you configure Storage.

Clone this wiki locally