How to apply a protein change to a given amino acid sequence to produce a mutant amino acid sequence #641

nh13 · 2022-06-24T19:08:48Z

I was thinking it would be in one of the mapping or projection classes, but I couldn't find one. I don't want to have to rely on transcript sequence database lookups or the like, but simply provide the reference AA sequence and the change to apply.

reece · 2022-06-24T20:34:08Z

I'm reading this as: you'd like to create a p. expression for an arbitrary protein change. Right?

As you pointed out in #640, and discussed in #333, there's poor congruence between the nt and aa data models, so the method will look different for nt and aa variation (unfortunately).

>>> var_p = parse("NP_012345.6:p.Ala22Trp")
>>> var_p.posedit.pos.start
AAPosition(base=22, aa=A, uncertain=False)
>>> var_p.posedit.edit
AASub(ref='', alt='W', uncertain=False, init_met=False)
>>> var_p.posedit.edit.alt = "Q"
>>> str(var_p)
'NP_012345.6:p.Ala22Gln'

Is that what you wanted?

#333 would aim to create greater parity between the models, and therefore enable something more like alt = "" or alt = "C" or alt = "QQQQQQ", with the appropriate del or sub or repeat.

kchu02 · 2022-06-29T17:52:17Z

@reece I have a similar problem. My vision on such a function would be

mutant_sequence = var_p.myfunction(wildtype_sequence)  # or
mutant_sequence = myfunction(wildtype_sequence, variant_in_string)

Is there any function in the package that would offer the conversion from (wildtype sequence, variant) -> (mutant sequence)?

nh13 · 2022-06-29T22:45:15Z

Agreed with @kchu02 , I want a method where I can supply the wildtype sequence and it applies the protein or cdna change. It should do some validation if ref/alt are available, but it can't in many situations.

mbiokyle29 · 2023-04-07T21:43:44Z

I have a similar use case but expanded to include the use case of having multiple variants. For example if I have 2 SNP variants in DNA space (in which ever space g./t./c.), if we assume that these variants are all within a transcript and within a coding region, there are 2 cases:

The variants are in different codons
The variants are in the same codon

If case 1 is true, then you can just convert _to_p, and pass around both AA variants, as this defines the new AA sequence. However, if case 2 is true, you essentially need to collapse the 2 DNA level variants into a single AA variant. Ex:

Codon: AAG -> Lys
V one:   C -> AAC/Asn
V two:  T  -> ATG/Met
both :  TC -> ATC/Ile

So i'd like to apply multiple variants to a DNA sequence, making the full alt sequence and then convert to AA (and get the per AA variants back out).

Edit:
After reading through #333 I have a better grasp of the complexity here :)

reece · 2023-04-11T11:48:51Z

@mbiokyle29 : Your use case is actually the inverse of the topic of this issue, which is reverse translation from protein to nucleic acid sequence.

Your issue is actually about what HGVS alleles (see the "[ ]" bullet on https://varnomen.hgvs.org/recommendations/general/, the links from there).

We took a stab at alleles a long time ago, but it presents numerous complexities, including how to handle/warn about overlapping variants, shuffling (including shuffling variants into each other), phasing certain/uncertain, coordinate shifting due to upstream indels. All of these have solutions, but understanding the complexity of this implementation requires a pretty deep understanding of HGVS... and even then there are unknown dragons lurking, I suspect. Meanwhile, I rarely see HGVS alleles in the wild, so it was very unclear that the complexity would be worth the effort.

mbiokyle29 · 2023-04-11T15:44:29Z

@reece thanks for the additional context (and again for all the hard work on the library). I have come up with an approach which solves my current challenge. For reference our use case is pretty divergent from the primary focus of HGVS. We are working on protein engineering tools, for tracking changes made to proteins of interest (stored, expressed and sequenced from plasmids). We have variant calls in VCF format in DNA space. What I did was:

Implement a datasource Interface which contains the plasmid reference info, a very rudimentary transcript model, etc
Implement a very rudimentary VCF -> HGVS parser
Implement a "merger" which combines adjacent (or 1bp separated) DNA mismatches/delins into a single delins
Map the delins into the amino acid

This idea was based on the "Note" section of https://varnomen.hgvs.org/recommendations/DNA/variant/delins/. Specifically:

two variants that are separated by fewer than two intervening nucleotides (that is, not including the variants themselves) should be described as a single “delins” variant

The merge code (which only works currently for directly adjacent variants, not the base in the middle case yet):

def merge_dna_variant(left, right, hdp):

    if all([
        left.ac == right.ac,
        left.type == right.type,
        left.posedit.edit.type in {'delins', 'sub'},
        right.posedit.edit.type in {'delins', 'sub'},
        left.posedit.pos.end.base + 1 == right.posedit.pos.start.base
    ]):

        new_start = left.posedit.pos.start.base
        new_end = right.posedit.pos.end.base
        ref = hdp.get_seq(new_start - 1, new_end)

        return SequenceVariant(
            ac=left.ac,
            type=left.type,
            posedit=PosEdit(
                pos=Interval(
                    start=SimplePosition(base=new_start),
                    end=SimplePosition(base=new_end),
                ),
                edit=NARefAlt(
                    alt=left.posedit.edit.alt + right.posedit.edit.alt,
                    alt=ref,
                ),
            ),
        )

I am confident that this is fairly abnormal, but hopefully doesn't produce an incorrect result. In my limited testing it appears to have worked.

github-actions · 2023-12-08T05:05:22Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions · 2023-12-17T01:59:10Z

This issue was closed because it has been stalled for 7 days with no activity.

reece · 2024-02-19T17:45:01Z

This issue was closed by stalebot. It has been reopened to give more time for community review. See biocommons coding guidelines for stale issue and pull request policies. This resurrection is expected to be a one-time event.

github-actions · 2024-05-20T01:58:42Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions bot added the stale Issue is stale and subject to automatic closing label Dec 8, 2023

github-actions bot added the closed-by-stale label Dec 17, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 17, 2023

reece reopened this Feb 19, 2024

reece added resurrected and removed stale Issue is stale and subject to automatic closing labels Feb 19, 2024

github-actions bot added the stale Issue is stale and subject to automatic closing label May 20, 2024

jsstevenson added keep alive exempt issue from staleness checks and removed stale Issue is stale and subject to automatic closing resurrected labels May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to apply a protein change to a given amino acid sequence to produce a mutant amino acid sequence #641

How to apply a protein change to a given amino acid sequence to produce a mutant amino acid sequence #641

nh13 commented Jun 24, 2022

reece commented Jun 24, 2022

kchu02 commented Jun 29, 2022

nh13 commented Jun 29, 2022

mbiokyle29 commented Apr 7, 2023 •

edited

reece commented Apr 11, 2023

mbiokyle29 commented Apr 11, 2023

github-actions bot commented Dec 8, 2023

github-actions bot commented Dec 17, 2023

reece commented Feb 19, 2024

github-actions bot commented May 20, 2024

How to apply a protein change to a given amino acid sequence to produce a mutant amino acid sequence #641

How to apply a protein change to a given amino acid sequence to produce a mutant amino acid sequence #641

Comments

nh13 commented Jun 24, 2022

reece commented Jun 24, 2022

kchu02 commented Jun 29, 2022

nh13 commented Jun 29, 2022

mbiokyle29 commented Apr 7, 2023 • edited

reece commented Apr 11, 2023

mbiokyle29 commented Apr 11, 2023

github-actions bot commented Dec 8, 2023

github-actions bot commented Dec 17, 2023

reece commented Feb 19, 2024

github-actions bot commented May 20, 2024

mbiokyle29 commented Apr 7, 2023 •

edited