Skip to content
Yuling Dai edited this page Apr 5, 2024 · 32 revisions

MetaMorpheus: Free, Open-Source PTM Discovery

  • Calibration Task: Calibrate raw data, always give more and better identifications.
  • G-PTM D Task: Find as many PTMs as possible.
  • Search Task: Search raw data with classic search, ion-indexed modern search, semispecific search, non-specific search
  • Crosslink Search Task: Search crosslinked peptides with cleavable or non-cleavable crosslinkers.
  • [O-glycopeptide Search Task]https://github.com/smith-chem-wisc/MetaMorpheus/wiki/O-Glyco-Search-Task): O-Pair Search identifies O-glycopeptides using an ion-indexed open modification search and localizes O-glycosites using graph theory and probability-based localization.
  • Hybrid Search Task: Hybrid Search combines protein database search and spectral library search. The database search will be performed first and then the results will be further evaluated by an imported spectral library.

The concept behind MetaMorpheus is simple. A significant percentage of peptides analyzed in bottom-up experiments contain post-translational modifications (PTMs) or sequence variants. Many search program ignore these peptides and look for only unmodified peptides. We created a way to discover these modified peptides while maintaining quality control (i.e., false discovery rate control).

How do I get started?

The easiest way to try out MetaMorpheus is to run through the bottom-up mouse vignette. The bottom-up CAST mouse vignette is composed of two spectra files of bottom-up mouse samples (.mzML and/or .raw format). Protein databases (.xml.gz format) and MetaMorpheus tasks (.toml format) are also provided for you. Download the whole package and drag the .raw or .mzml files, the .xml.gz files, and the .toml files into MetaMorpheus and click "Run All Tasks".

How do I perform my own search?

  • Download a protein database in .xml or .fasta format from UniProt and drag it into MetaMorpheus. There is no need to unzip the database. MetaMorpheus reads .gz compressed databases, in addition to uncompressed .xml and .fasta.
  • Next, drag your .raw or .mzML spectra files into MetaMorpheus.
  • Click on the "Search" button. Make appropriate settings adjustments for your data.
  • Under "Modifications", choose the variable/fixed mods you want. NOTE: This is not G-PTM-D! Use the G-PTM-D task to discover low-abundance PTMs.
  • Click "Add the Search Task"
  • Finally click "Run all tasks!"

By default, search results are written to the folder that contains your spectra files. PSM and peptide outputs are automatically generated. If "Construct protein groups" was enabled (this is enabled by default), then you will also have protein results to look at.

How do I find PTMs?

There are several methods of PTM discovery in MetaMorpheus.

  1. G-PTM search: You can search with a database that already contains the location of known PTMs. You can get databases which contain PTMs from UniProt. When downloading a list of proteins, select the "XML" format option. If you search with this database, MetaMorpheus will automatically interpret the PTMs found in the UniProt database, and PTM-containing peptides will appear in the search results. We call this a G-PTM search.
  2. G-PTM-D search: MetaMorpheus can also find PTMs that are not annotated in a UniProt database. We perform a two-pass search to do this; we refer to this strategy as Global PTM Discovery (G-PTM-D). The first search finds high-scoring matches between an experimental MS/MS and a theoretical MS/MS where the difference in mass corresponds to a known PTM (e.g. 79.97 Da for phosphorylation). MetaMorpheus then annotates the PTM in the protein database so that peptide can have a phosphorylation at all possible locations (e.g. S, T and Y). The second search with the new database uses these modified peptides as theoretical peptides for the search, and these peptides will be reported in the results.
  3. Variable modification search: This is the "old school" way of searching for modifications that most search programs use. Generally, it is very slow and prone to high FDRs, which often go underestimated. It is not recommended to use variable modification-type searching unless the PTMs are very common in the sample (e.g., acetylation on protein N-term, or oxidation on M).

What are the main features of MetaMorpheus?

MetaMorpheus's workflow is composed of "tasks". A task is part of a workflow that can be performed in MetaMorpheus. Currently, there are four tasks available to build a workflow with:

  1. Mass calibration ("Calibrate"): Mass calibration corrects systematic drift in mass measurements in mass spectrometry data. This is very useful when trying to discriminate between very similar theoretical peptides. For example, several PTMs have very similar mass; sulfonation (79.956815 Da) and phosphorylation (79.966331 Da) are only 0.009516 Da apart. Acetylation (42.010565 Da) and trimethylation (42.046950 Da) are only 0.036385 Da apart. High-quality calibration can make accurately identifying these PTMs possible. Additionally, calibration creates recommended precursor and fragment mass tolerances for subsequent tasks.
  2. PTM discovery ("Discover PTMs"): This is the task that performs G-PTM-D (described above). It takes in a database (either a .fasta or .xml) and annotates plausible PTM locations in it. Subsequent tasks use this annotated database.
  3. Search ("Search"): This is the basic search function of MetaMorpheus. It performs a search of the mass spectrometry proteomics data using the settings you select and the protein database you input. It can also perform MS1-intensity based label-free quantification.
  4. Crosslink search ("XL Search"): This is MetaMorpheusXL, an algorithm that identifies crosslinked peptides.

What about quantification?

Label-free quantification in MetaMorpheus is performed with FlashLFQ. You can read about it here and here. We recently enabled the software to perform normalization across conditions, samples, fractions and replicates. To perform intensity normalization, you need to define the experimental design, create a search task, and then check the box for "Normalize quantification results" in the quantification options.

How does MetaMorpheus deal with contaminants?

MetaMorpheus has the ability to search with multiple database files in a single search. These can be any combination of .fasta and .xml. You can use a list of contaminants from the Max Planck Institute by clicking "Add Default Contaminants" within MetaMorpheus, download an existing database of contaminants, or create your own contaminant database based on the type of sample you are analyzing. Once the contaminants database has been dragged into to MetaMorpheus, check the box marked "Contaminant". During the search, any peptide matching a contaminant protein gets marked as a contaminant.

I want to know more about how MetaMorpheus works.

MetaMorpheus's Wiki is a resource available to you to explain the inner workings of the software program. We are gradually expanding the documentation contained in the Wiki. Additionally, you may browse the codebase if you're familiar with the C# programming language.

I have a suggestion or need help.

If you have an issue or a question, please click on the "Issues" tab of this GitHub repository and create a new issue. You may also email the developers at mm_support@chem.wisc.edu. The latter option is better if you want to keep your communication with us private. Creating an issue is helpful, though, because it helps the developers keep a to-do list and creates a community around MetaMorpheus.