Annotated sequence peptide string #24

jspaezp · 2021-08-18T23:42:00Z

[feature request]

I was wondering if there is any desire to implement "adding the annotated ions on the peptide sequence string". "P]E]P]TIDEP[I[N[K" kind of deal ...

like it is show in the following image

(in addition it would be great to have precursor ion annotations, let me know if you would like to make a new issue for it/try to implement it)

my_spectrum_top.annotate_peptide_fragments(0.5, 'Da', ion_types='p')

Thanks for the great package!
Sebastian

EDIT: Adding precursor ions to annotation is already supported!

bittremieux · 2021-08-19T16:39:13Z

I've thought previously about trying to add such a peptide string with the fragments indicated, but it's not trivial with matplotlib. I'll have to see how hard it would be to actually implement it.

I'm currently working on some extra peak annotations (neutral losses, modifications via ProForma). Highlighting the precursor peak should definitely be possible, I'll try to add that functionality relatively shortly.

bittremieux · 2021-12-21T23:44:30Z

To be honest, I don't really have a good idea how to highlight the fragments in the peptide sequence using matplotlib. Suggestions on how to properly combine text with graphical elements are welcome.

pwilmart · 2021-12-22T00:23:38Z

I don’t know if this might have some useful information: http://www.aosabook.org/en/matplotlib.html <http://www.aosabook.org/en/matplotlib.html> Happy Holidays! -Phil

…

On Dec 21, 2021, at 3:44 PM, Wout Bittremieux ***@***.***> wrote: To be honest, I don't really have a good idea how to highlight the fragments in the peptide sequence using matplotlib. Suggestions on how to properly combine text with graphical elements are welcome. — Reply to this email directly, view it on GitHub <#24 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFZNQSUTQ7B5JEUGHWQZKNLUSEGOTANCNFSM5CNBUIZQ>. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you are subscribed to this thread.

Seb-Leb · 2022-03-02T12:32:54Z

I have implemented something like this by generating an svg with drawSvg and sticking it on to the plot. Its not always the most elegant, but it works..

bittremieux · 2022-03-03T00:02:25Z

Thanks for the tip. A matplotlib option would be preferable, but I don't know how to properly do it that way. If drawSvg works it could be worth looking into.

hkmoon · 2022-05-16T09:13:03Z

In https://github.com/Rappsilber-Laboratory/xiSPEC_spectrumViewer/ project, there is a way to draw fragments. It requires XiAnnotator(https://github.com/Rappsilber-Laboratory/pyXiAnnotator) for drawing it additionally. This is based on d3.js in javascript.

e.g. https://github.com/Rappsilber-Laboratory/xiSPEC_spectrumViewer/blob/master/example_linear.html

It might be useful how to draw peptide letters:
https://github.com/Rappsilber-Laboratory/xiSPEC_spectrumViewer/blob/cceb202b13d9361fe5f208fbf7f7985d5c3a7277/src/FragmentationKeyView.js#L293

mobiusklein · 2022-10-11T03:16:42Z

I did not want to work on my own work and saw this thread and thought I hadn't abused matplotlib in a while. Here's a Python implementation that is scale invariant while using the spectrum's own coordinate system. I sparingly use ms_deisotope's functionality as that is what I implemented this with, but these should be easily converted to use the spectrum representation that spectrum_utils uses.

from typing import Optional, Dict, Any, NamedTuple, Sequence, List
from ms_deisotope import spectrum_graph

import numpy as np

from matplotlib import path as mpath, patches as mpatch, transforms as mtransform, text as mtext, axes as maxes


class Peak(NamedTuple):
    mz: float
    intensity: float
    charge: int


class PeakNode(NamedTuple):
    peak: Peak

class PeakPathEdge(NamedTuple):
    start: PeakNode
    end: PeakNode
    annotation: str

    
class PeakPath(Sequence[PeakPathEdge]):
    pass

def bbox_path(path):
    nodes = path.vertices
    xmin = nodes[:, 0].min()
    xmax = nodes[:, 0].max()
    ymin = nodes[:, 1].min()
    ymax = nodes[:, 1].max()
    return (xmin, ymin, xmax, ymax)


def shift(path, x=0, y=0):
    return path.transformed(mtransform.Affine2D().translate(x, y))


def draw_ladder(ax: maxes.Axes,
                peak_path: PeakPath,
                scan,
                peak_line_options: Optional[Dict[str, Any]]=None,
                seq_line_options: Optional[Dict[str, Any]]=None,
                text_prop: Optional[mtext.FontProperties]=None,
                vertical_shift: float=0.0):
    
    if peak_line_options is None:
        peak_line_options = {}
    if seq_line_options is None:
        seq_line_options = {}
        
    peak_line_options.setdefault('linewidth', 1)
    peak_line_options.setdefault('color', 'red')
    seq_line_options.setdefault('linewidth', 2)
    seq_line_options.setdefault('color', 'red')
    
    # Compute the maximum height of peaks in the region to be annotated
    # so that there is no overlap with existing peaks
    upper = (
        max(
            [
                p.intensity
                for p in scan.deconvoluted_peak_set.between(
                    peak_path[0].start.peak.mz, peak_path[-1].end.peak.mz, use_mz=True
                )
            ]
        )
        * 1.2
    ) + vertical_shift
    
    # Compute the x-axis dimension aspect
    xlim = (min(p.mz for p in scan.deconvoluted_peak_set),
            max(p.mz for p in scan.deconvoluted_peak_set))
    # Compute the y-axis dimension aspect
    ylim = (0, scan.base_peak.deconvoluted().intensity)

    # Create an baseline scaling transformation for the text
    base_trans = mtransform.Affine2D()
    base_trans.scale((xlim[1] - xlim[0]) / 75, (ylim[1] - ylim[0]) / 25)

    # Don't try to annotate a ladder that changes charge state that
    # would involve back-tracking on the x-axis
    start_charge = None
    for i, edge in enumerate(peak_path):
        if start_charge is None:
            start_charge = edge.start.peak.charge
            
        # We'll write the annotation glyph in the middle of the gap
        mid = (edge.start.peak.mz + edge.end.peak.mz) / 2
        
        # Create the glyph(s) for the peak pair annotation at unit-scale
        # and then scale it up using the base transformation, then center it
        # at 0 again.
        tpath = mtext.TextPath((0, 0), edge.annotation, 1, prop=text_prop)
        tpath = (tpath.transformed(base_trans))

        if (
            edge.start.peak.charge != start_charge
            or edge.end.peak.charge != start_charge
        ):
            continue

        # Move the annotation glyph(s) to the midpoint between the two peaks
        # at the sequence line.
        tpath = shift(tpath, mid, upper * 0.99)

        # Check whether our annotation glyph(s) is too wide for the gap between
        # the two peaks. If it is too large, draw it above the main line to avoid
        # over-plotting.
        xmin, ymin, xmax, ymax = bbox_path(tpath)
        shift_up = (xmax - xmin) / (edge.end.peak.mz - edge.start.peak.mz) > 0.3
        if shift_up:
            tpath = shift(tpath, 0, ymax - ymin)

        ax.add_patch(mpatch.PathPatch(tpath, color="black"))

        # If this is the first peak pair, draw the starting point vertical
        # peak line.
        if i == 0:
            ax.plot(
                [edge.start.peak.mz, edge.start.peak.mz],
                [upper, edge.start.peak.intensity + ylim[1] * 0.05],
                **peak_line_options
            )

        # Draw the next vertical peak line
        ax.plot(
            [edge.end.peak.mz, edge.end.peak.mz],
            [upper, edge.end.peak.intensity + ylim[1] * 0.05],
            **peak_line_options
        )
        
        # Draw the horizontal line between the two peaks. If the annotation
        # was shifted up, draw a single horizontal line connecting the two
        # peaks.
        if shift_up:
            ax.plot(
                [edge.start.peak.mz, edge.end.peak.mz],
                [upper, upper],
                **seq_line_options
            )
        else:
            # Otherwise, draw a line from the starting peak to the annotation glyph
            # with some padding.
            ax.plot(
                [edge.start.peak.mz, max(xmin - xlim[1] * 0.01, edge.start.peak.mz)],
                [upper, upper],
                **seq_line_options
            )
            # And then draw another line from the other side of the glyph with some
            # padding to the second peak.
            ax.plot(
                [min(xmax + xlim[1] * 0.01, edge.end.peak.mz), edge.end.peak.mz],
                [upper, upper],
                **seq_line_options
            )

Here are some examples. I have a match object that knows how to draw a (glyco)peptide spectrum match, and a list of PeakPath objects called paths which I'll draw a random selection from:

art = match.plot()
draw_ladder(
    art.ax,
    paths[0],
    match.scan,
    peak_line_options={"color": "blue", "linestyle": "--"},
    seq_line_options={"color": "blue"},
)

draw_ladder(
    art.ax,
    paths[7],
    match.scan,
    peak_line_options={"color": "red", "linestyle": "--"},
    seq_line_options={"color": "red"},
)

Another, different, spectrum to show it's not magic-numbered for the first one:

art = match2.plot()
draw_ladder(art.ax, paths2[0], match2.scan, peak_line_options={"linestyle": '--'})

And then normalizing the intensity scale for the second spectrum:

Some notes on the implementation:

I do not attempt to deal with paths which change charge states part way through. This type of annotation is simply un-suited to it.
I still use some pseudo-magic numbers when setting up the base scaling transform. Increase the x-portion of the transform to make the letters wider, but you probably won't want to do that because if they are too wide they don't fit in the space provided, especially the low mass amino acids like G and A.
I use an object model that needs to be told about peaks which doesn't quite match what the original request is. I do not think adding control characters to a string is the ideal way specify this when you might have anything in the peptide sequence specification.

Since I didn't directly implement this against spectrum_utils and we disagree over input signatures, does this motivate anyone to talk about how would be most convenient to signify the ladder path you want to draw so this could go from extra long comment to a PR?

bittremieux · 2022-10-11T17:14:14Z

Thanks Joshua! This is a great start to the discussion. I'll be out of town until the end of the month, but if you're up for it, we should discuss this in more detail in November.

mobiusklein · 2022-10-11T22:49:10Z

Sure. I may have refined this a bit by then for figure making.

bittremieux · 2023-07-04T19:31:19Z

I'm trying to come up with something and would like to get your opinions.

I think this is more or less what was requested. (As a next step we can try to add the ladder as well, but first I want to get this functionality figured out.)

A few questions:

The sequence is the amino acids only, no modifications because those would lead to random residue widths. This could be considered a bit misleading/confusing though. Is this acceptable or do people have recommendations on how to optimize this?
Currently I only indicate fragments for singly-charged and canonical fragments. Should additional charge states, losses, etc., be considered as well and should that somehow be reflected in the annotated peptide sequence?
I suggest to only allow a single of a/b/c (downward facing) and x/y/z (upward facing) fragment indications, otherwise it would get pretty messy. Thoughts?

@jspaezp @mobiusklein @pwilmart @wfondrie @RalfG

RalfG · 2023-07-05T11:47:06Z

This looks great!

Modifications: Perhaps the amino acid letters could be colored to indicate a modification, potentially using different colors for different modifications?
Other ion types: I think ideally it should be an option for additional backbone ions to be considered. Not sure if a distinction should be made between simple singly charged ions and other ion types for visualization. Perhaps a dotted/dashed line instead of a solid line if only 'more exotic' ions are present?
Indications: If I understand correctly, you mean adding multiple lines between amino acids indicating different ion types (a/b/c or x/y/z)? Here I would opt for visual simplicity and 'summarize' the presence of multiple ion types for a single ion number (e.g. a2 and y2) into a single line between the amino acids. In the end, the goal of this visualization is to know for which amino acid ranges there is evidence in the spectrum, regardless of ion types.

jspaezp · 2023-07-05T14:41:34Z

The sequence is the amino acids only, no modifications because those would lead to random residue widths. This could be considered a bit misleading/confusing though. Is this acceptable or do people have recommendations on how to optimize this?

I agree that explicit mods might not be a first priority and I would agree with Ralf's suggestion of color-coding modified AAs would be a good way to convey mod information without changing the 'monospaced' nature of the annotation.

Currently I only indicate fragments for singly-charged and canonical fragments. Should additional charge states, losses, etc., be considered as well and should that somehow be reflected in the annotated peptide sequence?

I would argue that it should annotate higher charge states but I don't feel like there is the need to have them show separately. (It could use whatever the fragment charge state is set by max_ion_charge ref: https://spectrum-utils.readthedocs.io/en/latest/api.html#spectrum_utils.spectrum.MsmsSpectrum.annotate_proforma)

I suggest to only allow a single of a/b/c (downward facing) and x/y/z (upward facing) fragment indications, otherwise it would get pretty messy. Thoughts?

I think this would fit 95+% use cases, so I do not feel like it would be totally required in a first implementation. But just as a reference and inspiration but I have seen in the past offsetting the annotation in the 'y' axis and color coding it as a way to annotate multiple fragment series. (fig 8 https://pubs.acs.org/doi/pdf/10.1021/jasms.2c00214)

I think as it is right now looks great! good job!

mobiusklein · 2023-07-05T21:31:26Z

It's very easy to make these graphics too noisy trying to include all that information, and choosing what to draw and when to draw it is going to be application specific.

The sequence is the amino acids only, no modifications because those would lead to random residue widths. This could be considered a bit misleading/confusing though. Is this acceptable or do people have recommendations on how to optimize this?

I agree, coloring the modified amino acid is the usual thing done. Sometimes it's also written lowercase, but color-coding is usually more useful, especially if you keep it uniform across multiple plots.

Currently I only indicate fragments for singly-charged and canonical fragments. Should additional charge states, losses, etc., be considered as well and should that somehow be reflected in the annotated peptide sequence?

This gets into interfaces. If you provide sane defaults for your high level interface (e.g. all charge states, no neutral losses), that's reasonable, and if you want to expose a lower level interface, that's where you might either add a bunch of flag combinators or flat out just accept a predicate function for (peak, annotation) -> bool on whether an annotation adds a bar and/or a predicate to transform (annotations_at: list) -> str to control how the ion series label is drawn, e.g. so you can abuse unicode or LaTeX to decorate the fragment name. This type of inversion of control might clash with your existing interface and complicate your code undesirably.

I suggest to only allow a single of a/b/c (downward facing) and x/y/z (upward facing) fragment indications, otherwise it would get pretty messy. Thoughts?

That's reasonable. Again, agreeing with Ralf that what we care about is "this bond was observed to break". If you're doing fancy modification localization, then you might want to draw all the ion ladders separately. For my implementation, I stack labile modified and unmodified annotations using different color slash marks over the same bond, but it only goes two deep because more is not helpful and may get too messy unless scale is carefully controlled.

Some implementation questions are:

How does the sequence logo interact with the matplotlib.Axes your spectrum is plotted on. Is it the same Axes and you are exploiting transforms to keep scale under control, or did you use Figure.add_axes with a bounding box to carve out a sub-figure? Is that region a parameter?
Is this drawn with Text or TextPath? They interact with axis limits differently.
Does this depend upon a specific font family?
What happens to the sequence graphic if the sequence is long? 20 AA? 30 AA?
- What happens to the size per AA?
- What happens to the spacing around/between annotations? Do they overlap each other?

bittremieux · 2023-07-07T08:03:29Z

Very relevant questions, and I haven't settled on an optimal implementation yet.

Currently I provide additional parameters to the plot method to add the peptide sequence to an existing Axes/spectrum plot. Alternatively I could piggyback on the facet plot introduced in #46 to use an extra axis. I see advantages for both options. In the former situation, the peptide sequence can be moved everywhere in the figure, providing more customizable figures. In contrast, the latter situation might provide a cleaner API by extracting the peptide sequence plotting functionality as a separate method.

Then we get into specific implementation details. I think it will be pretty cumbersome or even impossible to find a "perfect" solution that works in all situations. It's probably much easier to have a good-enough implementation that works most of the time. And then try to provide relevant parameters to customize aspects of the plot so that advanced users can try to fix those edge cases themselves.

RalfG · 2023-07-12T14:30:18Z

Currently I provide additional parameters to the plot method to add the peptide sequence to an existing Axes/spectrum plot. Alternatively I could piggyback on the facet plot introduced in #46 to use an extra axis.

I guess the current method also allows it to be used in a facet if it receives an empty axis to plot onto? That would seem the most flexible approach to me; where it can plot on top of an existing axis, potentially with a spectrum/mirror plot, or onto an empty exis, which could be part of a larger facet plot.

bittremieux added the enhancement New feature or request label Aug 19, 2021

bittremieux added the help wanted Extra attention is needed label Dec 21, 2021

timosachsenberg mentioned this issue Dec 1, 2023

Plotting fix - matched peaks highlighting OpenMS/OpenMS#7243

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotated sequence peptide string #24

Annotated sequence peptide string #24

jspaezp commented Aug 18, 2021 •

edited

bittremieux commented Aug 19, 2021

bittremieux commented Dec 21, 2021

pwilmart commented Dec 22, 2021 via email

Seb-Leb commented Mar 2, 2022

bittremieux commented Mar 3, 2022

hkmoon commented May 16, 2022

mobiusklein commented Oct 11, 2022

bittremieux commented Oct 11, 2022

mobiusklein commented Oct 11, 2022

bittremieux commented Jul 4, 2023

RalfG commented Jul 5, 2023

jspaezp commented Jul 5, 2023

mobiusklein commented Jul 5, 2023

bittremieux commented Jul 7, 2023 •

edited

RalfG commented Jul 12, 2023

Annotated sequence peptide string #24

Annotated sequence peptide string #24

Comments

jspaezp commented Aug 18, 2021 • edited

bittremieux commented Aug 19, 2021

bittremieux commented Dec 21, 2021

pwilmart commented Dec 22, 2021 via email

Seb-Leb commented Mar 2, 2022

bittremieux commented Mar 3, 2022

hkmoon commented May 16, 2022

mobiusklein commented Oct 11, 2022

bittremieux commented Oct 11, 2022

mobiusklein commented Oct 11, 2022

bittremieux commented Jul 4, 2023

RalfG commented Jul 5, 2023

jspaezp commented Jul 5, 2023

mobiusklein commented Jul 5, 2023

bittremieux commented Jul 7, 2023 • edited

RalfG commented Jul 12, 2023

jspaezp commented Aug 18, 2021 •

edited

bittremieux commented Jul 7, 2023 •

edited