Skip to content

Architecture

peder2911 edited this page Jan 24, 2022 · 10 revisions

Architecture

This article gives an overview of the architectural decisions made for ViEWS 3, as well as the infrastructure running the resulting system.

Principles

ViEWS 3 has a service-oriented architecture (SOA). Each reasonably delineable piece of functionality is served by a discrete service. For ViEWS 3, a service is defined as a docker container running a web-server process, for example uvicorn.

A good analogy for explaining what a service is, is the biological cell. Like a cell, a service has a clearly delineated border, and a set of functions. Also like a cell, a service interacts with its environment, and forms systems with other cells. In concert, cells and services form super-organisms that yield much more functionality than the sum of their parts.

Services become systems via communication. For ViEWS, this communication takes place in the form of JSON passed via HTTP. What this means, is that services that talk to other services via request libraries like requests, and that service processes are web servers, capable of handling such requests. Ergo, the system comprised is a network, similar to a collection of networked machines.

Services as cells

With recent developments in deployment technologies and practices, the analogy between a biological cell and a service can be extended further: Services can now be scaled up and down, like a superorganism creating and destroying instances of cells to fit its needs, and to heal (replace) damaged or malfunctioning cells.

A central principle of SOA is that a service has responsibility for a set of related functionalities, which does not grow beyond what is practical (in which case, a separate service is warranted). This encourages strict separation of code, functionality and information between services, making it easier to follow modular programming principles. This makes it possible to write big, complex systems that remain maintainable and simple to reason about, as well as extend and rewrite: Each service becomes an interchangeable module that is replaceable with any other module that offers the same API.

The choice of Docker for deployment makes ViEWS 3 extremely portable. In addition, using a SOA makes it possible to partition different services between various hosts, increasing the scalability; the potential performance of the system.

Views 3 architecture

Schematic

The services comprising Views 3 are:

  • Storefront: An Nginx server that proxies traffic to Queryset Manager and Docs
  • Queryset Manager: A service responsible for managing queryset definitions, and queryset retrieval.
  • Job Manager: A service that is responsible for managing the execution of compute jobs upstream.
  • Router: A service that routes traffic to either the cache, the Base Data Retriever or the Data Transformer
  • Base Data Retriever: A service responsible for fetching data from the database
  • Data Transformer: A service that runs data transformation operations
  • Docs: A service that exposes various documentation and system introspection information.

The purpose of Views 3 is to allow for flexible data retrieval. Data flows via various services from the database to the end user. To allow for a high degree of flexibility and expressiveness on the user side, several intermediate actions may be performed between the request for a certain data column and the database call. Since these actions might be expensive in terms of time and compute resources, extensive caching is employed, as well as job management to ensure that the system does not run jobs in parallel if the same resource is requested twice.

A typical data request begins with a HTTP call to the Storefront, which forwards the call to the Queryset Manager. For each column in the requested queryset, the queryset manager sends a data request to the Job Manager. These requests are individually broken down into series of dependent subjobs, which ultimately depend on a database retrieval operation. The Job Manager proceeds to traverse each chain of jobs, forwarding the database request job to the router, or fetching its result from the cache if available, while making sure to not request the same operation twice.

In the end, each request that the Queryset Manager sends results in a data column, which is then merged together into a dataset, which is returned to the user. In order to avoid timeouts, the queryset manager will return a 204 on first-request of a queryset, if not all of the data columns are ready.