Skip to content

MTBLS233 workflow

Marco Capuccini edited this page Feb 24, 2017 · 2 revisions

MTBLS233 with PhenoMeNal Jupyter

In this page we introduce an OpenMS preprocessing workflow, and R downstream analysis that you can run using the Jupyter fronted, that is provided by PhenoMeNal.

Introduction

The aim of the study performed on MTBLS233 was to produce quantitative information of the highest possible number of reliable features in untargeted metabolomics. Three different approaches of mass spectromic acquisition parameter tuning were tested to see which gave the highest number of spectral features.

In this proof-of-principle workflow we recreate the workflow used in the MTBLS233 study in a distributed manner to run on the PhenoMeNal platform. The workflow was originally implemeted in OpenMS v. 1.1.1. followed by the downstream analysis in KNIME. Here we fire up and controll the pipeline with Jupyter where the preprocessing in OpenMS has been wrapped in Docker containers to facilitate scaling, and the downstream analysis written in R has been extracted and implemented directly in Jupyter.

Run the preprocessing workflow

Start by opening Jupyter in your browser at:

http://notebook.<deployment-id>.phenomenal.cloud/

Ingest the MTBLS233 dataset from MetaboLights

MetaboLights offers an FTP service, so we can ingest the MTBLS233 dataset with Linux commands.

  1. First open a Jupyter terminal: New > Terminal
  2. Ingest the dataset using wget:
wget ftp://anonymous@ftp.ebi.ac.uk/pub/databases/metabolights/studies/public/MTBLS233/*.mzML -P MTBLS233/data/

Run the preprocessing workflow with Luigi

In order to run the preprocessing analysis we use the Luigi wrokflow system. Please notice that this is a heavy analysis, and to run it successfully you will have to deploy a moderately large number of fat nodes in your cloud provider. To run the preprocessing workflow please run:

cd MTBLS233 
export PYTHONPATH=./ 
luigi --module preprocessing_workflow AllGroups \
  --scheduler-host luigi.default \
  --workers <parallelism-level>

Warning: Remember to substitute <parallelism-level> with the number of parallel processes that you aim to spawn in the cluster

If everithing goes well you'll be able to monitor the progress of your analysis at:

http://luigi.<deployment-id>.phenomenal.cloud/

Run the downstream analysis

To open the downstream analysis notebook, please go to:

MTBLS233 > downstream-analysis > downstream-analysis.ipynb

The CSV output generated by the TextExporter in OpenMS will be saved in the MTBLS233/results directory and it is set as input in their respective mass range.

To run the workflow click Cell > Run All.

After successfully running the whole workflow you may change the parameters to see the impact on the result.

Clone this wiki locally