Uniprot annotation in OC can be misleading #79

ctokheim · 2021-10-20T18:56:22Z

I noticed a case where the primary transcript for variant annotation would seemingly indicate that it matches the uniprot protein sequence, but in actuality it does not. For example, see the below screenshot for ZBTB3 variant at position 39 on the ensembl transcript, and seemingly indicates that position would also match uniprot Q9H5J0.

However, if you go to the actual uniprot entry Q9H5J0, the real position of the Arg residue is position 89, which matches the second ensembl transcript.

It seems like the correct uniprot id for the primary transcript is actually A0A6E1W9L1. My thought is that even if OC doesn't always use the canonical protein sequence from uniprot as its primary transcript, it should fix the table from suggesting that it is.

rkimoakbioinformatics · 2021-10-28T20:51:06Z

@ctokheim Thanks for letting us know Collin. It turns out that Gencode V33 itself has wrong UniProt mappings, which were directly transferred to OpenCRAVAT's mapper. As I check the latest Ensembl 104, ENST to UniProt mapping for the two transcripts are correct as you described. One quick patch would be fixing just UniProt mapping, before releasing the mapper with a newer gene model. I'll see if the quick patch is possible.

jasminebro assigned kmoad Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uniprot annotation in OC can be misleading #79

Uniprot annotation in OC can be misleading #79

ctokheim commented Oct 20, 2021

rkimoakbioinformatics commented Oct 28, 2021

Uniprot annotation in OC can be misleading #79

Uniprot annotation in OC can be misleading #79

Comments

ctokheim commented Oct 20, 2021

rkimoakbioinformatics commented Oct 28, 2021