Skip to content

de.NBI summer school 2017 on cloud computing

c-ruttkies edited this page Jun 27, 2017 · 64 revisions

As part of the de.NBI Summer School on Cloud Computing for Bioinformatics from 25th June - 1st July 2017 in Giessen (see also http://www.denbi.de/22-training-cat/training-courses/256-cloud1) PhenoMeNalist Christoph Ruttkies (IPB Halle) shows how to lift a PhenoMeNal CRE on the local de.NBI cloud (see also the earlier blog post http://phenomenal-h2020.eu/home/2017/01/13/phenomenal-meets-denbi/).

OpenStack PhenoMeNal Installation Tutorial

Part I: Set up Deploy Node in OpenStack

The Deploy instance node will be used to deploy PhenoMeNal in OpenStack as it provides all prerequisites needed (docker, git).

  • set up an instance in OpenStack with the provided "css_ubuntu_git_docker"
  • assign a floating IP for external access
  • assign a SSH key pair of your local machine

After the creation of the Deploy node:

  • login to the running instance via
ssh ubuntu@FLOATING_IP_OF_DEPLOY_NODE
# root permissions are required as we need to use docker
sudo su -
# create a new folder in which you will put all necessary files in the next steps
mkdir css2017 && cd css2017

Part II: Install prerequisites for cloud deployment

  • Checkout the cloud deployment from GitHub
git clone --recursive https://github.com/phnmnl/cloud-deploy-kubenow.git
cd cloud-deploy-kubenow
git checkout development/bucetin
git submodule update

Part III: Create the KubeNow configuration for your OpenStack

  • Download and source the “Download OpenStack RC File v3” (Compute -> Access & Security), which is a small shell script setting a number of environment variables required for access to the OpenStack API
  • Copy cloud-deploy-kubenow/config.ostack.sh-template to cloud-deploy-kubenow/config.ostack.sh which contains settings for the cloud deployment
  • Next, we need to modify the cloud-deploy-kubenow/config.ostack.sh to match our OpenStack installation. The required information is spread across several OpenStack horizon pages, or available through the OpenStack command line client
#!/usr/bin/env bash

# Cloud Prefix Name
export TF_VAR_cluster_prefix="your_unique_prefix"
# Path to "Download OpenStack RC File v3" from https://cloud.computational.bio.uni-giessen.de/horizon/project/access_and_security/?tab=access_security_tabs__api_access_tab
export OS_CREDENTIALS_FILE="/path/to/phenomenal-giessen-openrcv3.sh"

# Specific for de.NBI Giessen openstack
export TF_VAR_floating_ip_pool="NETWORK_NAME"
export TF_VAR_external_network_uuid="NETWORK_ID"

# If your cloud provider is not allowing external nameservers, please specify here or
# uncomment and leave empty for provider automatic configuration
# export TF_VAR_dns_nameservers=""

# Master configuration
# Note: too small flavors might cause diffuse errors on your installation
export TF_VAR_master_as_edge="true"
export TF_VAR_master_flavor="de.NBI.default"

# Node configuration
export TF_VAR_node_count="3"
export TF_VAR_node_flavor="de.NBI.default"

# Gluster configuration
export TF_VAR_glusternode_count="1" # 1 - 3 depending on preferred replication factor
export TF_VAR_glusternode_flavor="de.NBI.default"
export TF_VAR_glusternode_extra_disk_size="100" # Size in GB

# Galaxy
export TF_VAR_galaxy_admin_email="yourname@domain.org"

# create password with e.g. head /dev/urandom | sha256sum | head -c 16 ; echo
export TF_VAR_galaxy_admin_password="yournewpassword"

# Jupyter 
# create password with e.g. head /dev/urandom | sha256sum | head -c 16 ; echo
export TF_VAR_jupyter_password="yournewpassword"

# Kubernetes dashboard
export TF_VAR_dashboard_username="admin"
export TF_VAR_dashboard_password="password"

Part IV: Lifting up PhenoMeNal

  • source the downloaded OpenStack RC File v3 (you are asked for your password)
source /path/to/phenomenal-giessen-openrcv3.sh
  • deploy the cluster to OpenStack
./phenomenal.sh deploy ostack -c CONFIG_FILE
  • At this stage you should see the progress in the "Network -> Network Topology -> Graph" of your OpenStack project
  • Terraform talks to openstack
  • Ansible talks to virtual hosts
  • Launching kubernetes / helm -> packaging for kubernetes

(Optional) Part V: Uninstalling PhenoMeNal

The Phenomenal deployment can be uninstalled by a single command (please don't use this now :)).

./phenomenal.sh destroy ostack -c CONFIG_FILE

PhenoMeNal Cloud Administration

You can manage the Kubernetes (Phenomenal) cluster via command line or by using the Kubernetes Dashboard.

Dashboard Administration

  • Access the Kubernetes Dashboard in your browser
http://dashboard.FLOATING_IP_ADDRESS_OF_MASTER.nip.io/
  • Check available settings e.g. Nodes

(Optional) Command line Administration

  • Login on the master node using the private key file vre.key in deployments
ssh -i ~/css2017/cloud-deploy-kubenow/deployments/id-phnmnl-config.ostack/vre.key ubuntu@FLOATING_IP_ADDRESS_OF_MASTER
  • some useful commands (just for your information)
# show all pods
kubectl get all
kubectl get all --all-namespaces

# get more detailed information about pods
kubectl describe $pod

# restart galaxy (try not to use this for the moment ;))
con=$(kubectl get pods | grep Running | cut -d" " -f1 | grep galaxy)
kubectl delete pods/$con
sleep 4
con=$(kubectl get pods | grep Running | cut -d" " -f1 | grep galaxy)
kubectl exec -i -t $con -- /galaxy/run.sh --restart

# delete a whole pod (try not to use this for the moment ;))
kubectl delete $pod --all

Usage of PhenoMeNal Cloud e-Infrastructure via Galaxy

All Phenomenal tools are accessible by the Galaxy environment shipped by the cloud installation. You can access Galaxy by the provided URL:

http://galaxy.FLOATING_IP_ADDRESS_OF_MASTER.nip.io/
  • login with the provided Galaxy email address and password (cloud-deploy-kubenow/config.ostack.sh)

Part I: Use Case - Statistical (Sacurine) Workflow

Introduction

Characterization of the physiological variations of the metabolome in biofluids is critical to understand human physiology and to avoid confounding effects in cohort studies aiming at biomarker discovery. In this study, conducted by the MetaboHUB French Infrastructure for Metabolomics, urine samples from 183 adults were analyzed by reversed phase (C18) ultra-high performance liquid chromatography (UPLC) coupled to high-resolution mass spectrometry (LTQ-Orbitrap). After preprocessing of the raw files, a total of 109 metabolites were identified in the negative ionization mode at confidence levels provided by the metabolomics standards initiative (MSI) levels 1 or 2 (Roux et al, 2012).

The physiological variations of the identified metabolites with age, body mass index, and gender, were analyzed by univariate hypothesis testing (with control of the False Discovery Rate; univariate module) and multivariate OPLS modeling (Thevenot et al, 2015; multivariate module), and a metabolite signature for significant gender classification by OPLS-DA, Random Forest or SVM was further identified (Rinaudo et al, 2016; biosigner module).

The history (workflow and input/output data) is publicly available on the Workflow4Metabolomics e-infrastructure (Giacomoni et al, 2015) with reference W4M00001_Sacurine-statistics (credentials can be obtained by requesting an account).

Raw files (in both Thermo proprietary and mzML open formats) are publicly available on the MetaboLights repository (MTBLS404).

Source: https://github.com/phnmnl/phenomenal-h2020/wiki/Sacurine-statistical-workflow

Steps

In the following, we will run the statistical workflow in Galaxy on the original Metabolights study (MTBLS404). With the workflow we try to find differences amang two patient groups on the metabolomic level. The two groups will be marked by their gender (male, female).

  • Access the workflow file and input files for the Statistical Workflow and download them to your local machine
  • Upload all input files to your Galaxy history (Analyze Data -> Get Data -> Upload File)
DataMatrix-Input.tsv # contains metabolite intensities for each patient
SampleMetadataInput-Input.tsv # contains meta data for each patient
VariableMetadataInput-Input.tsv # contains metabolite annotations
  • Upload the workflow file (Workflow -> Upload or import workflow -> Browse -> Import)
w4m-sacurine-statistics.ga

Part II: Use Case - MetFrag Workflow

A next step following the Statistical Workflow, where we found "interesting" metabolite features, would be the annotation/identification of the underlying metabolite. A powerful technique is mass spectrometry (i.e Tandem Mass Spectrometry -> MS/MS) where molecules can be fragmented. The acquired masses of these fragments can give hints of the underlying molecule (metabolite) structure.

One tool that makes use of this information is MetFrag which performs a database search to retrieve candidate molecules. These candidates are fragmented in silico, matched to the MS/MS data and scored accordingly. The output will be a ranked list of candidate molecules where the top ranked positions give hints for the correct molecular structure.

With running the next workflow we process sevaral MS/MS spectra from different metabolites. The data is stored in mzML format. The data will be pre-processed using XMCS which includes peak-picking to extract informative data peaks. Then, data will annotated using the R-package CAMERA that tries to find isotopes and adducts. Finally, the information is used to create MetFrag parameter files which are passed to the MetFrag tool.

Steps

  • Access the workflow file and input file for the MetFrag Workflow and download them to your local machine
  • Upload all input files to your Galaxy history (Analyze Data -> Get Data -> Upload File)
example_data_ms2.mzml # contains MS and MS/MS data
  • Upload the workflow file (Workflow -> Upload or import workflow -> Browse -> Import)
Galaxy-Workflow-MetFragWorkflow.ga
  • Create a new history in Galaxy
  • Visualize the uploaded workflow by clicking "edit"
  • Check the different settings of each tool (node) on the right side by clicking on the specific node
  • Check that the database for the tool 'msms2metfrag' is set to PubChem
  • Click run, set the proper input file and finally run the workflow
  • You should see the progress of the workflow on the right side

After the workflow is finished we have an output CSV file for each metabolite spectrum.

  • Visualize the results by using the Galaxy tool 'metfrag-vis' which you can find in the 'Tools' section of Galaxy
  • Use the parameter and result collection as input and tun the tool

As output a PDF summary file will be created containing information about the Top1 candidates of the MS/MS spectra.

Clone this wiki locally