Skip to content

yufree/xcmsrocker

Repository files navigation

xcmsrocker: Rocker image for metabolomics data analysis

Software and data is required for reproducible research. However, detailed workflows connecting software and data would be the key to reproducible research in metabolomics studies. Xcmsrocker is a linux based rocker/docker image to host the workflow of R based metabolomics software. It includes multiple mainstream R packages used in metabolomics study with RStduio as IDE. Such image could be deployed on single machine or cluster(HPC or cloud computing).

Besides, rmwf package is attached in this image to provide detailed workflow template( File - New File - R Markdown - From Template - Select template with {rmwf}) and facilitate the users to perform metabolomics data analysis and/or comparisons. Specifically, paired mass distances dependent analysis (PMDDA) and reactomics analysis templates could be found here.

If you preferred to perform Python code within RStudio through reticulate package, you might try metaborocker.

You are welcome to contribute your new algorithm/software/workflow! Just PR!

Click here and relate video to check the poster for ASMS 2022.

Citation

  • Yu, M., Dolios, G., Petrick, L., 2022. Reproducible untargeted metabolomics workflow for exhaustive MS2 data acquisition of MS1 features. Journal of Cheminformatics 14, 6. https://doi.org/10.1186/s13321-022-00586-8

Workflow template usage

  1. Install Docker and run Docker in your system

  2. Pull the Rocker image docker pull yufree/xcmsrocker:latest

2.1 If you don't use RStudio and only run R script on HPC, you can use sif version: docker pull yufree/xcmsrocker:sif

2.2 If you preferred running image on computer with ARM processor (M1 or Raspberry pi), you can use arm version: docker pull yufree/xcmsrocker:arm

  1. Use docker run -e PASSWORD=xcmsrocker -p 8787:8787 yufree/xcmsrocker to start the image

3.1 If you need to access your local data on current directory, you can use docker run -v $(pwd):/home/rstudio/$USER -e PASSWORD=xcmsrocker -p 8787:8787 yufree/xcmsrocker

  1. Open the browser and visit http://localhost:8787 or http://[your-ip-address]:8787 to power on RStudio server

  2. Default user name is rstudio and password is xcmsrocker

  3. Enjoy your data analysis! If you preferred to try PMDDA workflow, do the following step in RStudio:

  • Go to File - New File - Rmarkdown...
  • Click 'From Template'
  • Choose 'PMDDA Metabolomics Workflow' and click OK
  • You will see a Rmd file with PMDDA data analysis script.

Step 2-6 could be visualized:

pmdda

Packages

Peak picking

  • xcms Generate peaks list/EIC/diffreport
  • x13cms global tracking of isotopic labels in untargeted metabolomics

Improved Peak picking

  • IPO For xcms peak picking optimazation
  • Autotuner Automated parameter selection for untargeted metabolomics data processing

Comparison

  • IPO/Autotunner/default setting of xcms

  • Template

rmarkdown::draft("peakpicking.Rmd", template = "peakpicking", package = "rmwf")

For MS/MS

  • msPurity Automated Evaluation of Precursor Ion Purity for Mass Spectrometry-Based Fragmentation in Metabolomics
  • MetDIA Targeted Metabolite Extraction of Multiplexed MS/MS Spectra Generated by Data-Independent Acquisition for SWATH

Peak filter/visulization/workflow

  • enviGCMS Filter peaks based on experimental design
  • metaMS An open-source pipeline for GC–MS-based untargeted metabolomics
  • ChemoSpec Exploratory Chemometrics for Spectroscopy
  • UpSetR A More Scalable Alternative to Venn and Euler Diagrams for Visualizing Intersecting Sets

Peak annotation/group/selection

  • pmd Select the independent peaks based on paired mass distance analysis
  • CAMERA Annotation of peaklists generated by xcms, rule based annotation of isotopes and adducts, isotope validation, EIC correlation based tagging of unknown adducts and fragments
  • RAMClustR A feature clustering algorithm for non-targeted mass spectrometric metabolomics data
  • Rdisop Decomposition of Isotopic Patterns
  • InterpretMSSpectrum Annotate and interpret deconvoluted mass spectra (mass*intensity pairs) from high resolution mass spectrometry devices
  • classyfireR Retrieve existing entity classifications from SMILES or InChls.

Comparison

  • CAMERA, RAMClustR, pmd, xMSannotator

  • Template

rmarkdown::draft("annotation.Rmd", template = "annotation", package = "rmwf")

Batch correction

  • Msprep Summarization, normalization and diagnostics for processing of mass spectrometry–based metabolomic data by Median, Quantile, Cross-Contribution Compensating Multiple Standard Normalization (CRMN), Surrogate Variable Analysis (SVA) and Removal of Unwanted Variation (RUV).
  • BatchCorrMetabolomics Improved batch correction in untargeted MS-based metabolomics by pool QC.

Comparison

  • Data with QCs/run order/batch information: loess, spline, ComBat

  • Data without QCs/run order/batch information: normalize to zero mean and unit variance,normalize to zero mean and squared root variance,normalize to zero mean but variance/SE,vast scaling,level scaling,total sum row,Median row,Mean row,PQN,VSN,Quantile,lumi rsn,Limma CyclicLoess,AFFA CUBICSpline,SVA,iSVA,PCR

  • Template

rmarkdown::draft("normalization.Rmd", template = "normalization", package = "rmwf")

Peaks identification

  • CompoundDb Creating and Using (Chemical) Compound Annotation Databases
  • MetaboAnnotation MS2 annotation
  • xMSannotator MS1 annotation
  • MetFragr The R package enables functionalities from the MetFrag Commandline tool to be used within the R programming language.

Omics

  • xMWAS a data-driven integration and differential network analysis tool.
  • MetabNet An R Package for Metabolic Association Analysis of High-Resolution Metabolomics Data.
  • metapone Conducts pathway test of metabolomics data using a weighted permutation test

Statistical analysis

  • caret general machine learning workflow for more than 200 models
  • caretEnsemble Functions for creating ensembles of caret models
  • pROC Tools for visualizing, smoothing and comparing receiver operating characteristic (ROC curves). (Partial) area under the curve (AUC) can be compared with statistical tests based on U-statistics or bootstrap. Confidence intervals can be computed for (p)AUC or ROC curves.
  • gWQS Fits Weighted Quantile Sum (WQS) regressions for continuous, binomial, multinomial and count outcomes.
  • multcomp Simultaneous Inference in General Parametric Models to solve multiple comparisons issues
  • h2o Automating the machine learning workflow
  • table1 Tables of Descriptive Statistics in HTML

Chemometrics

  • rcdk Interface to the 'CDK' Libraries
  • ChemmineR Cheminformatics Toolkit for R
  • webchem Chemical Information from the Web

Reproducible research

  • Risa Converting experimental metadata from ISA-tab into Bioconductor data structures
  • rmzTab-m R implementation for mzTab-M
  • rmwf Reproducilble Metabolomics WorkFlow(RMWF) is a R package for xcmsrocker. It will show the workflow templates and demo data for different R-based metabolomics software.

Similar projects

R

Here is a nice review on R package for metabolomics.

  • patRoon open source software platform for environmental mass spectrometry based non-target screening

  • MetaboAnalystR R functions for MetaboAnalyst and they maintain docker image officially.

  • tidymass the whole workflow of data processing and analysis for LC-MS-based untargeted metabolomics using tidyverse principles and you can find the official docker image here.

  • R for Mass Spectrometry R software for the analysis and interpretation of high throughput mass spectrometry assays.