Skip to content
View sajfb's full-sized avatar

Organizations

@idslme
Block or Report

Block or report sajfb

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
sajfb/README.md

CRAN stats Python PyTorch LinkedIn Google Scholar

I'm an award-winning data scientist bridging cheminformatics and metabolomics focusing on small molecule discovery and mass spectrometry data sciences (see my award news from Metabolomics Association of North America (MANA) and my presentation details here).

I've crafted multiple computational pipelines designed for untargeted mass spectrometry data processing across diverse research domains including metabolomics, lipidomics, exposomics, and environmental studies. My software development philosophy emphasizes on maximal automation, highest precision, multi-platform compatibility, and user-friendly interfaces to minimize lab-based experiments.

I am always driven to advance next-generation AI for chemistry and biological applications through cutting-edge research.

Completed projects

Mass Spectrometry Data Processing Workflows at the Integrated Data Science Laboratory for Metabolomics and Exposomics

image description

  • IDSL_MINT (Mass INTerpretator) is deep learning framework to further interpret unannoated mass spectrometry data using deep cheminformatics analyses.

  • IDSL.ICA (Integrated Compound Annotation) is a full-scale annotation workflow to facilitate integration of metabolomics data for multi-omics analyses. (pending release ... )

  • IDSL.IPA (Intrinsic Peak Analysis) is a chromatographic peak-picking software package which is able to screen at lowest signal intensities (S/N > 2). IDSL.IPA is able to pair isotopologues with a fixed distance (e.g. ΔC = 13C - 12C = 1.003354835336 Da), to detect chromatographic peaks via geometric analysis, to correct retention time drifts using endogenous index markers, to align peaks (m/z-RT pairs) across population size studies (N > 200), to fill gaps on the aligned peak tables, to annotate peaks, and to visualize extracted ion chromatograms (EICs) and total ion chromatograms (TICs).

  • IDSL.FSA (Fragmentation Spectra Analysis) is a computational fragmentation annotation workflow to annotate .msp (mass spectra format) and .mgf (Mascot generic format) fragmentation data files rapidly via measurement of spectral entropy and/or cosine similarity even when precursor values are not available nor reliable. IDSL.FSA also may be employed to process bottom-up proteomics data.

  • IDSL.CSA (Composite Spectra Analysis) is a pipeline to deconvolute fragmentation spectra from Data Dependent Acquisition (DDA), and various Data-Independent Acquisition (DIA) methods such as SWATH-MS, MSE, and All-Ion Fragmentation (AIF) analyses.

  • IDSL.UFA (United Formula Annotation) is a computationally enhanced pipeline to annotate chromatographic peaks with molecular formula using an isotopic profile matching approach. IDSL.UFA only requires MS1 level data which is especially beneficial when MS/MS data are not available. The IDSL.UFA pipeline can screen for isotopic profiles of up to 108 molecular formulas using a computationally efficient algorithm without any memory complications.

  • IDSL.UFAx (exhaustive UFA) was developed to annotate chromatographic peaks using an exhaustive chemical enumeration-based approach. This package can perform elemental composition calculations using the following 15 elements: C, B, Br, Cl, K, S, Si, N, H, As, F, I, Na, O, and P. IDSL.UFAx is also able to screen for isotopic profiles of 1027 molecular formulas without any memory complications; however, IDSL.UFAx is not computationally as fast as IDSL.UFA.

  • IDSL.SUFA is a simplified version of the IDSL.UFA package to calculate isotopic profiles and adduct formulas from molecular formulas with no dependency on other R packages for online tools such as isotopic profile calculator. The IDSL.SUFA package also provides functions to process user-defined adduct formulas.

  • IDSL.NPA (Nominal Peak Analysis) is a pipeline for processing nominal mass spectrometry data to create and annotate .msp files for untargeted MS/MS workflows.

  • IDSL.MXP (Mass Spectrometry Parser) is a light and fast parser for mzML/mzXML/netCDF mass spectrometry data files. IDSL.MXP is especially a proven tool to read corrupted mass spectrometry files.

Computational mass spectrometry pipelines for environmental cheminformatics projects as part of my doctoral research

  • An IPDC (Isotopic Profile Deconvoluted Chromatogram) algorithm to screen biologically complex environmental matrices for unknown contaminants using chemometric methods. The IPDC algorithm was successfully employed in five different projects during my PhD.

Pinned

  1. idslme/IDSL_MINT idslme/IDSL_MINT Public

    A Deep Learning Framework to Interpret Raw Mass Spectrometry (m/z) Data

    Python 12 1

  2. idslme/IDSL.IPA idslme/IDSL.IPA Public

    Intrinsic Peak Analysis (IPA) pipeline for peak-picking in large-scale untargeted small molecule analysis including metabolomics, lipidomics, exposomics, and environmental studies.

    R 14 1

  3. idslme/IDSL.UFA idslme/IDSL.UFA Public

    United Formula Annotation (UFA) for LC-HRMS data

    R 7 1

  4. idslme/IDSL.CSA idslme/IDSL.CSA Public

    Composite Spectra Analysis

    R 5