Skip to content
@idslme

Integrated Data Science Laboratory for Metabolomics and Exposomics

Exposome and Human Health

Exposome is the totality of all exposures that we encounter during our life. Chemicals are the main component of the exposome. There can be thousands of chemicals that we are exposed to in our daily life. Some of them, such as hormones are generated by our own body, and some, such as phthalates or PFAS are introduced by the consumer industry. To monitor these exposures, our body's samples, such as blood, urine, stool, hair, nails, saliva can be analyzed using advanced machines such as mass spectrometry that can measure thousands of chemicals in a snapshot. These machines can generate mountains of data in a matter of hours for a single person's samples. One of the key challenges we are facing is to interpret these data in the context of human diseases and to prioritize novel chemical exposures.

IDSL.ME team develops software, databases and novel approaches to analyze and interpret metabolomics and exposomics datasets in population-scale studies. More details about the group's research and resources can be found at this IDSL.ME. The group is part of the Department of Environmental Medicine and Public Health, EMPH and the Institute of Exposomics Research at the Icahn School of Medicine at Mount Sinai, ISMMS, New York, USA.

Principal Investigator: Dinesh Barupal

Key resources:

LC/GC-HRMS data processing :

  • IDSL.MXP : A light-weight parser for mzML, netCDF and mzXML files
  • IDSL.IPA : To generate comprehensive data matrices from an untargeted LC/GC - HRMS dataset
  • IDSL.UFA : To annotate MS1 level data will molecular formula using isotope profile similarity
  • IDSL.CSA : To annotate peaks using a compositie spectra created using MS1 only data
  • IDSL.FSA : To annotate peaks using fragmentation data generated using DIA and DDA methods
  • IDSL_MINT : A python workflow for training transformer models to predict molecular fingerprints from a MS/MS spectra

Metabolic Bioinformatics:

  • IDSL.GOA : query the Gene Ontology Database for a multi-omics data interpretation
  • ChemRICH : Metabolite set analysis that is independent of a background database
  • MetaMapp : Metabolic network mapping using atomic mapping of reactions and chemical similarity

Databases:

  • ECID : Exposome Correlation and Interpretation Database (ECID) (NIEHS U24ES035386 Biomedical Knowledgebase)
  • CCDB : a database of inter-chemical correlations
  • Blood Exposome DB : A text mining driven catalogue of chemicals found in a blood sample
  • Cancer Hazard Prioratization : To prioratize cancer hazards for IARC Monographs programme
  • PubMed-FT : NLP guided queries of PubMed abstracts
  • PMC-FT : NLP guided queries of full text data in the PMC database

Key Publications

  • IDSL_MINT: a deep learning framework to predict molecular fingerprints from mass spectra. Baygi SF, Barupal DK. J Cheminform. 2024 Jan 18;16(1):8. doi: 10.1186/s13321-024-00804-5. PMID: 38238779
  • IDSL.GOA: gene ontology analysis for interpreting metabolomic datasets. Mahajan P, Fiehn O, Barupal D. Sci Rep. 2024 Jan 14;14(1):1299. doi: 10.1038/s41598-024-51992-x.
  • IDSL.CSA: Composite Spectra Analysis for Chemical Annotation of Untargeted Metabolomics Datasets, Baygi SF, Kumar Y, Barupal DK. Anal Chem. 2023 Jun 27;95(25):9480-9487. doi: 10.1021/acs.analchem.3c00376. Epub 2023 Jun 13.
  • IDSL.UFA Assigns High-Confidence Molecular Formula Annotations for Untargeted LC/HRMS Data Sets in Metabolomics and Exposomics. Baygi SF, Banerjee SK, Chakraborty P, Kumar Y, Barupal DK. Anal Chem. 2022 Oct 4;94(39):13315-13322. doi: 10.1021/acs.analchem.2c00563. Epub 2022 Sep 22.
  • IDSL.IPA Characterizes the Organic Chemical Space in Untargeted LC/HRMS Data Sets. Fakouri Baygi S, Kumar Y, Barupal DK. J Proteome Res. 2022 Jun 3;21(6):1485-1494. doi: 10.1021/acs.jproteome.2c00120
  • CCDB: A database for exploring inter-chemical correlations in metabolomics and exposomics datasets. Barupal DK, Mahajan P, Fakouri-Baygi S, Wright RO, Arora M, Teitelbaum SL. Environ Int. 2022 Jun;164:107240. doi: 10.1016/j.envint.2022.107240. Epub 2022 Apr 18.
  • Prioritizing cancer hazard assessments for IARC Monographs using an integrated approach of database fusion and text mining. Barupal DK, Schubauer-Berigan MK, Korenjak M, Zavadil J, Guyton KZ. Environ Int. 2021 Nov;156:106624. doi: 10.1016/j.envint.2021.106624. Epub 2021 May 10.
  • Generating the Blood Exposome Database Using a Comprehensive Text Mining and Database Fusion Approach. Barupal DK, Fiehn O. Environ Health Perspect. 2019 Sep;127(9):97008. doi: 10.1289/EHP4713. Epub 2019 Sep 26.
  • Chemical Similarity Enrichment Analysis (ChemRICH) as alternative to biochemical pathway mapping for metabolomic datasets. Barupal DK, Fiehn O. Sci Rep. 2017 Nov 6;7(1):14567. doi: 10.1038/s41598-017-15231-w.
  • MetaMapp: mapping and visualizing metabolomic data by integrating information from biochemical pathways and chemical and mass spectral similarity. Barupal DK, Haldiya PK, Wohlgemuth G, Kind T, Kothari SL, Pinkerton KE, Fiehn O. BMC Bioinformatics. 2012 May 16;13:99. doi: 10.1186/1471-2105-13-99.

Funding

  • NIEHS ( U24ES035386 ) for the Exposome Correlation and Interpretation Database (ECID) [2023-2028] PIs Dinesh Barupal and Susan Teitelbaum.
  • Lab is contributing to several other NIH-funded projects (P30ES023515, U2CES026561, U2CES026555, U2CES030859, R01ES033688, UL1TR004419, R35ES030435, R01ES032831, UH3OD023337)

Contribution guidelines:

  • Most of our software are written in the R and Python programming languages. For online tools, we are using the ReactJS framework. Submit your request to contribute to IDSL.ME codebase to dinesh.barupal@mssm.edu . Significant contributions will be credited with authorship in future manuscripts.

Positions

  • We are always looking for bioinformatics programmers, post-doc fellows in metabolomics/exposomics/toxicology, data curators (omics, literature, biomonitoring), data science analysts. Reach out to dinesh.barupal@mssm.edu with your CV.

Pinned

  1. ChemRICH ChemRICH Public

    Forked from barupal/ChemRICH

    Chemical Similarity Enrichment analysis of metabolomics datasets

    HTML

  2. ECID ECID Public

    Exposome Correlation and Interpretation Database (ECID)

  3. IDSL.IPA IDSL.IPA Public

    Intrinsic Peak Analysis (IPA) pipeline for peak-picking in large-scale untargeted small molecule analysis including metabolomics, lipidomics, exposomics, and environmental studies.

    R 14 1

  4. IDSL.GOA IDSL.GOA Public

    Gene Ontology Analysis for Metabolomics

    R 3 2

  5. IDSL_MINT IDSL_MINT Public

    A Deep Learning Framework to Interpret Raw Mass Spectrometry (m/z) Data

    Python 11 1

Repositories

Showing 10 of 18 repositories
  • ECID Public

    Exposome Correlation and Interpretation Database (ECID)

    0 0 0 0 Updated May 1, 2024
  • .github Public
    0 0 0 0 Updated Apr 30, 2024
  • IDSL_MINT Public

    A Deep Learning Framework to Interpret Raw Mass Spectrometry (m/z) Data

    Python 11 1 0 0 Updated Jan 21, 2024
  • CCDB Public

    Chemical Correlation Database (CCDB)

    R 4 0 0 0 Updated Jan 18, 2024
  • IDSL.GOA Public

    Gene Ontology Analysis for Metabolomics

    R 3 Apache-2.0 2 0 0 Updated Jan 4, 2024
  • IDSL.CSA Public

    Composite Spectra Analysis

    R 5 MIT 0 0 0 Updated Jun 27, 2023
  • IDSL.NPA Public

    A pipeline for processing nominal mass spectrometry data to create .msp files for untargeted MS/MS workflows.

    R 1 MIT 0 0 0 Updated Jun 27, 2023
  • IDSL.FSA Public

    Fragmentation Spectra Analysis

    R 1 MIT 0 0 0 Updated Jun 27, 2023
  • IDSL.IPA Public

    Intrinsic Peak Analysis (IPA) pipeline for peak-picking in large-scale untargeted small molecule analysis including metabolomics, lipidomics, exposomics, and environmental studies.

    R 14 1 0 0 Updated Jun 1, 2023
  • IDSL.UFA Public

    United Formula Annotation (UFA) for LC-HRMS data

    R 7 MIT 1 0 0 Updated May 18, 2023

Top languages

Loading…

Most used topics

Loading…