de.NBI summer school 2017 on cloud computing

As part of the de.NBI Summer School on Cloud Computing for Bioinformatics from 25th June - 1st July 2017 in Giessen (see also http://www.denbi.de/22-training-cat/training-courses/256-cloud1) PhenoMeNalist Christoph Ruttkies (IPB Halle) shows how to lift a PhenoMeNal CRE on the local de.NBI cloud (see also the earlier blog post http://phenomenal-h2020.eu/home/2017/01/13/phenomenal-meets-denbi/).

OpenStack PhenoMeNal Installation Tutorial

Part I: Set up Deploy Node in OpenStack

The Deploy instance node will be used to deploy PhenoMeNal in OpenStack as it provides all prerequisites needed (docker, git).

set up an instance in OpenStack with the provided "css_ubuntu_git_docker"
assign a floating IP for external access
assign a SSH key pair of your local machine

After the creation of the Deploy node:

login to the running instance via

ssh ubuntu@FLOATING_IP_OF_DEPLOY_NODE
# root permissions are required as we need to use docker
sudo su -
# create a new folder in which you will put all necessary files in the next steps
mkdir css2017 && cd css2017

Part II: Install prerequisites for cloud deployment

Checkout the cloud deployment from GitHub

git clone --recursive https://github.com/phnmnl/cloud-deploy-kubenow.git
cd cloud-deploy-kubenow
git checkout development/bucetin
git submodule update

Part III: Create the KubeNow configuration for your OpenStack

Download and source the “Download OpenStack RC File v3” (Compute -> Access & Security), which is a small shell script setting a number of environment variables required for access to the OpenStack API
Copy cloud-deploy-kubenow/config.ostack.sh-template to cloud-deploy-kubenow/config.ostack.sh which contains settings for the cloud deployment
Next, we need to modify the cloud-deploy-kubenow/config.ostack.sh to match our OpenStack installation. The required information is spread across several OpenStack horizon pages, or available through the OpenStack command line client

#!/usr/bin/env bash

# Cloud Prefix Name
export TF_VAR_cluster_prefix="your_unique_prefix"
# Path to "Download OpenStack RC File v3" from https://cloud.computational.bio.uni-giessen.de/horizon/project/access_and_security/?tab=access_security_tabs__api_access_tab
export OS_CREDENTIALS_FILE="/path/to/phenomenal-giessen-openrcv3.sh"

# Specific for de.NBI Giessen openstack
export TF_VAR_floating_ip_pool="NETWORK_NAME"
export TF_VAR_external_network_uuid="NETWORK_ID"

# If your cloud provider is not allowing external nameservers, please specify here or
# uncomment and leave empty for provider automatic configuration
# export TF_VAR_dns_nameservers=""

# Master configuration
# Note: too small flavors might cause diffuse errors on your installation
export TF_VAR_master_as_edge="true"
export TF_VAR_master_flavor="de.NBI.default"

# Node configuration
export TF_VAR_node_count="3"
export TF_VAR_node_flavor="de.NBI.default"

# Gluster configuration
export TF_VAR_glusternode_count="1" # 1 - 3 depending on preferred replication factor
export TF_VAR_glusternode_flavor="de.NBI.default"
export TF_VAR_glusternode_extra_disk_size="100" # Size in GB

# Galaxy
export TF_VAR_galaxy_admin_email="yourname@domain.org"

# create password with e.g. head /dev/urandom | sha256sum | head -c 16 ; echo
export TF_VAR_galaxy_admin_password="yournewpassword"

# Jupyter 
# create password with e.g. head /dev/urandom | sha256sum | head -c 16 ; echo
export TF_VAR_jupyter_password="yournewpassword"

# Kubernetes dashboard
export TF_VAR_dashboard_username="admin"
export TF_VAR_dashboard_password="password"

Part IV: Lifting up PhenoMeNal

source the downloaded OpenStack RC File v3 (you are asked for your password)

source /path/to/phenomenal-giessen-openrcv3.sh

deploy the cluster to OpenStack

./phenomenal.sh deploy ostack -c CONFIG_FILE

At this stage you should see the progress in the "Network -> Network Topology -> Graph" of your OpenStack project
Terraform talks to openstack
Ansible talks to virtual hosts
Launching kubernetes / helm -> packaging for kubernetes

(Optional) Part V: Uninstalling PhenoMeNal

The Phenomenal deployment can be uninstalled by a single command (please don't use this now :)).

./phenomenal.sh destroy ostack -c CONFIG_FILE

PhenoMeNal Cloud Administration

You can manage the Kubernetes (Phenomenal) cluster via command line or by using the Kubernetes Dashboard.

Dashboard Administration

Access the Kubernetes Dashboard in your browser

http://dashboard.FLOATING_IP_ADDRESS_OF_MASTER.nip.io/

Check available settings e.g. Nodes

(Optional) Command line Administration

Login on the master node using the private key file vre.key in deployments

ssh -i ~/css2017/cloud-deploy-kubenow/deployments/id-phnmnl-config.ostack/vre.key ubuntu@FLOATING_IP_ADDRESS_OF_MASTER

some useful commands (just for your information)

# show all pods
kubectl get all
kubectl get all --all-namespaces

# get more detailed information about pods
kubectl describe $pod

# restart galaxy (try not to use this for the moment ;))
con=$(kubectl get pods | grep Running | cut -d" " -f1 | grep galaxy)
kubectl delete pods/$con
sleep 4
con=$(kubectl get pods | grep Running | cut -d" " -f1 | grep galaxy)
kubectl exec -i -t $con -- /galaxy/run.sh --restart

# delete a whole pod (try not to use this for the moment ;))
kubectl delete $pod --all

Usage of PhenoMeNal Cloud e-Infrastructure via Galaxy

All Phenomenal tools are accessible by the Galaxy environment shipped by the cloud installation. You can access Galaxy by the provided URL:

http://galaxy.FLOATING_IP_ADDRESS_OF_MASTER.nip.io/

login with the provided Galaxy email address and password (cloud-deploy-kubenow/config.ostack.sh)

Part I: Use Case - Statistical (Sacurine) Workflow

Introduction

Characterization of the physiological variations of the metabolome in biofluids is critical to understand human physiology and to avoid confounding effects in cohort studies aiming at biomarker discovery. In this study, conducted by the MetaboHUB French Infrastructure for Metabolomics, urine samples from 183 adults were analyzed by reversed phase (C18) ultra-high performance liquid chromatography (UPLC) coupled to high-resolution mass spectrometry (LTQ-Orbitrap). After preprocessing of the raw files, a total of 109 metabolites were identified in the negative ionization mode at confidence levels provided by the metabolomics standards initiative (MSI) levels 1 or 2 (Roux et al, 2012).

The physiological variations of the identified metabolites with age, body mass index, and gender, were analyzed by univariate hypothesis testing (with control of the False Discovery Rate; univariate module) and multivariate OPLS modeling (Thevenot et al, 2015; multivariate module), and a metabolite signature for significant gender classification by OPLS-DA, Random Forest or SVM was further identified (Rinaudo et al, 2016; biosigner module).

The history (workflow and input/output data) is publicly available on the Workflow4Metabolomics e-infrastructure (Giacomoni et al, 2015) with reference W4M00001_Sacurine-statistics (credentials can be obtained by requesting an account).

Raw files (in both Thermo proprietary and mzML open formats) are publicly available on the MetaboLights repository (MTBLS404).

Source: https://github.com/phnmnl/phenomenal-h2020/wiki/Sacurine-statistical-workflow

Steps

In the following, we will run the statistical workflow in Galaxy on the original Metabolights study (MTBLS404). With the workflow we try to find differences amang two patient groups on the metabolomic level. The two groups will be marked by their gender (male, female).

Access the workflow file and input files for the Statistical Workflow and download them to your local machine
Upload all input files to your Galaxy history (Analyze Data -> Get Data -> Upload File)

DataMatrix-Input.tsv # contains metabolite intensities for each patient
SampleMetadataInput-Input.tsv # contains meta data for each patient
VariableMetadataInput-Input.tsv # contains metabolite annotations

Upload the workflow file (Workflow -> Upload or import workflow -> Browse -> Import)

w4m-sacurine-statistics.ga

Visualize the uploaded workflow by clicking "edit"
Check the different settings of each tool (node) on the right side by clicking on the specific node (https://github.com/phnmnl/container-univariate, https://github.com/phnmnl/container-multivariate, https://github.com/phnmnl/container-biosigner
Click run, set the proper input file and finally run the workflow
You should see the progress of the workflow on the right side

Part II: Use Case - MetFrag Workflow

A next step following the Statistical Workflow, where we found "interesting" metabolite features, would be the annotation/identification of the underlying metabolite. A powerful technique is mass spectrometry (i.e Tandem Mass Spectrometry -> MS/MS) where molecules can be fragmented. The acquired masses of these fragments can give hints of the underlying molecule (metabolite) structure.

One tool that makes use of this information is MetFrag which performs a database search to retrieve candidate molecules. These candidates are fragmented in silico, matched to the MS/MS data and scored accordingly. The output will be a ranked list of candidate molecules where the top ranked positions give hints for the correct molecular structure.

With running the next workflow we process sevaral MS/MS spectra from different metabolites. The data is stored in mzML format. The data will be pre-processed using XMCS which includes peak-picking to extract informative data peaks. Then, data will annotated using the R-package CAMERA that tries to find isotopes and adducts. Finally, the information is used to create MetFrag parameter files which are passed to the MetFrag tool.

Steps

Access the workflow file and input file for the MetFrag Workflow and download them to your local machine
Upload all input files to your Galaxy history (Analyze Data -> Get Data -> Upload File)

example_data_ms2.mzml # contains MS and MS/MS data

Upload the workflow file (Workflow -> Upload or import workflow -> Browse -> Import)

Galaxy-Workflow-MetFragWorkflow.ga

Create a new history in Galaxy
Visualize the uploaded workflow by clicking "edit"
Check the different settings of each tool (node) on the right side by clicking on the specific node
Check that the database for the tool 'msms2metfrag' is set to PubChem
Click run, set the proper input file and finally run the workflow
You should see the progress of the workflow on the right side

After the workflow is finished we have an output CSV file for each metabolite spectrum.

Visualize the results by using the Galaxy tool 'metfrag-vis' which you can find in the 'Tools' section of Galaxy
Use the parameter and result collection as input and tun the tool

As output a PDF summary file will be created containing information about the Top1 candidates of the MS/MS spectra.

	Funded by the EC Horizon 2020 programme, grant agreement number 654241

PhenoMeNal Logo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

de.NBI summer school 2017 on cloud computing

OpenStack PhenoMeNal Installation Tutorial

Part I: Set up Deploy Node in OpenStack

Part II: Install prerequisites for cloud deployment

Part III: Create the KubeNow configuration for your OpenStack

Part IV: Lifting up PhenoMeNal

(Optional) Part V: Uninstalling PhenoMeNal

PhenoMeNal Cloud Administration

Dashboard Administration

(Optional) Command line Administration

Usage of PhenoMeNal Cloud e-Infrastructure via Galaxy

Part I: Use Case - Statistical (Sacurine) Workflow

Introduction

Steps

Part II: Use Case - MetFrag Workflow

Steps

Popular pages

Clone this wiki locally