Skip to content

Overview of PhenoMeNal Architecture and Components

namratakale edited this page Apr 18, 2018 · 26 revisions

The PhenoMeNal Cloud Research Environment (CRE) can run on public cloud resources, or local servers, and enable users to carry out scalable and integrated metabolomics data analysis.

The main components of the PhenoMeNal CRE are as follows:

  • Standardised software tools wrapped as containers
  • Standardised and interoperable data formats
  • Interfaces such as the Galaxy workflow engine and Jupyter Notebooks for data analysis

Introduction and architecture

The PhenoMeNal CRE is designed as a microservice architecture, with the services implemented as Virtual Machine Images (VMIs) and software containers. It makes use of Docker as the software container platform and Kubernetes (K8s) as the container orchestrator. The PhenoMeNal tools are available as containers and can be easily deployed without the need for any manual installation or management of dependencies. In addition, these containers, in an elastic IT-environment, can be scaled to run parallel analysis on multiple compute nodes.
For an overview of the architecture, see the diagram below figure 1.

PhenoMeNal Architecture

Figure 1: The PhenoMeNal architecture (right) with selected implementations, depicted as a stack diagram and aligned to a general microservice-based architectures (left).

A general microservice-based stack

The lowest level is the hardware e.g a computer or a virtual cloud running on a cluster. A provisioning software is then used to prepare and equip the virtual cluster with the necessary software-layers. The latter starts with a kernel that is an intermediary between the hardware (possibly virtual) and OS and deals with the resource management, load-balancing, runtime scheduling and several other functions. Every single node runs its own kernel and OS, with a cluster OS on top as an abstraction-layer (making it appear as if all the nodes are part of one big computer). By combining the fundamental functions provided by the kernel with a cluster OS, this results in a virtual cluster with combined resources and the ability to split workloads between nodes as if they were all part of the same physical machine. The operating system then takes over and handles most of the communication, hence allowing the installation of the desired services. The next level is the container engine for mounting and running containers containing microservices. The main function is supporting the launching, scaling, management and termination of its auxiliary containers. The container orchestration software operates through the container engines API.

Containers are pieces or parts of a program running within a closed virtual environment, containing only the files needed for it to function. This makes a container entirely independent of the surrounding software environment and can be moved to and run on any operating system having the required container engine. In this use-case, microservices are wrapped up in software containers which means they are easy to add, remove and rearrange for the desired workflow.

The microservices running within these containers are all independent functions, usually from existing software packages. Containerising these functions comes with several benefits, where their quick launch is one of the most important. This results in fast and simple scalability as required, since additional virtual nodes can be added to the virtual cluster, provisioned with all the software needed and then supplied with the necessary container. In a fraction of the time it would take to build, configure and install additional physical machines, a virtual cluster can accommodate larger workloads.

The PhenoMeNal stack

PhenoMeNal is built to run on private machines as well as with any Infrastructure-as-a-Service-provider. Kubernetes gather a cluster of nodes, to a single workspace and functions as its kernel and OS. Docker has been chosen as the container-environment. The required analysis functions are downloaded on the fly as small independent Docker containers and mounted through Kubernetes’ orchestration tools.

Deployment

PhenoMeNal has developed and use KubeNow to allow rapid deployment, scaling, and tearing down of Kubernetes clusters on, public and private, cloud systems (e.g., AWS, GCE and OpenStack). This also works on local clusters (Vagrant). KubeNow is a thin layer on top of well established software (Terraform, Packer, Ansible and kubeadm), see figure 2 below. By deploying a KubeNow cluster the user will get:

  • A Kubernetes cluster up and running in less than 10 minutes (provisioned with kubeadm);
  • Weave networking;
  • Traefik HTTP reverse proxy and load balancer;
  • Cloudflare dynamic DNS integration;
  • GlusterFS distributed file system.

PhenoMeNal components Figure 2: KubeNow delivers kubernetes clusters with dynamic DNS integration, networks, load balancing, and a distributed file system. KubeNow wraps existing industry-standard tools and is used to deploy PhenoMeNal CREs.

Workflow orchestration

Apart from establishing a complete virtual infrastructure and setting up a kubernetes cluster, users of the PhenoMeNal CRE need interfaces and tools to work with the containerised software applications developed in WP9. PhenoMeNal uses the Galaxy workflow environment to support scheduling jobs as Docker containers on a Kubernetes cluster (a contribution that is now adapted in the galaxy project). Galaxy is integrated into the main deployment process of the PhenoMeNal CRE, which means that users deploying a private PhenoMeNal CRE will immediately get a running instance of Galaxy that is secured for their own private usage only. The Luigi workflow engine (developed by Spotify Inc) has also extended to support scheduling jobs on Kubernetes clusters consisting of containers, and is also available on the PhenoMeNal CRE. This contribution was also pushed upstream to Spotify. For examples of workflows, see the corresponding section in the PhenoMeNal main Wiki page.

Provisioning of services

PhenoMeNal provisions services as containers in an scalable infrastructure and makes them easily available through workflow environments. Containers are written by individual tool developers with source published in code repositories (e.g. GitHub). The PhenoMeNal Continuous Integration System pulls the source code, builds the containers, tests the containers, and if tests passes pushes them to container repositories such as Docker Hub and the PhenoMeNal private container repository (see figure 3 below). From the CRE, these containers are made available for download and use from within the workflow engine (such as Galaxy) and can be scheduled inside the Kubernetes cluster.

PhenoMeNal components Figure 3: Overview of the continuous development and operation in PhenoMeNal.

Related materials

See webinar from Ola on Challenges and Opportunities with Virtual Research Environments

Clone this wiki locally