Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expand Clinical biomarkers #202

Open
user-tq opened this issue Dec 19, 2022 · 3 comments
Open

expand Clinical biomarkers #202

user-tq opened this issue Dec 19, 2022 · 3 comments

Comments

@user-tq
Copy link

user-tq commented Dec 19, 2022

This is an amazing project. I learned a lot from it.

I noticed this

Clinical biomarkers included in PCGR are limited to the following:
Evidence items for specific markers in CIViC must be accepted (submitted evidence items are not considered)
Markers reported at the variant level (e.g. BRAF p.V600E)
Markers reported at the codon level (e.g. KRAS p.G12)
Markers reported at the exon level (e.g. KIT exon 11 mutation)
Within the [Cancer bioMarkers database (CGI)], only markers collected from FDA/NCCN guidelines, scientific literature, and clinical trials are included (markers collected from conference abstracts etc. are not included)
Copy number gains/losses

It seems to lose some info.

for instance

MET  Exon 14 Skipping Mutation
BRCA2 Loss-of-function
TP53 Overexpression

These cannot be converted to genomic variants through TransVar.But they should not be discarded.

I tried to build a complete local database containing the relationship between mutations and drugs. The result annotated in vep/annovar is used as the query input.
When querying the civic database, I found that there are so many variants that it seems impossible to establish a unified set of rules.
Besides, I don't know what the result of VEP/annovar annotation is, so I can't build a dictionary to convert it into the input field of the database.

Do you have any ideas and suggestions about this?

@sigven
Copy link
Owner

sigven commented Jan 5, 2023

Hi @user-tq (cc @lhogstrom),

Thanks for reaching out! I am glad you find the project useful, and appreciate your time looking into the biomarker matching part.

Basically, you are touching upon a very challenging part with respect to variant interpretation, that is to match existing biomarkers against a set of variants coming from a tumor/patient. There are two essential sides to this challenge:

  1. The level of variant annotation detail that can be captured from the sequencing data for a given patient/tumor (as provided by VEP etc)
  2. The level of detail (resolution) provided per biomarker (e.g. in the CIViC database)

As i think you have realized, there are numerous entries in CIViC that comes at a fairly coarse-grained resolution (e.g. ATM mutation). In principle, there are a tons of potential variants from a tumor that can be considered to fit this criteria (i.e. any coding mutation in the ATM gene), but it is highly likely that only a few of them are actually relevant as potential biomarkers. So the challenge is to specify which gene variants that are most likely to act as biomarker, right? In PCGR, when biomarkers are very loosely defined, we simply do not consider them for reporting, as we do not have any means to report a confident finding. I think, generally speaking, that biomarkers reported at a "coarse-grained" resolution are less trustworthy than biomarkers reported at a finer resolution (amino acids, codons, exons), and I think this is also supported by looking at the evidence level of these variants in CIViC.

Regarding your examples:

  1. Exon skipping mutations - I believe we currently do not capture such information from VEP, and I am not sure how easy it can be retrieved, but I will look further into how this can be captured. If you have any input here, that would be highly welcome

  2. Loss-of-function biomarkers (gene level) are used for CPSR (germline workflow) when a gene in the germline of a cancer patient carries a loss-of-function variant (as determined by LOFTEE). We can add this also for PCGR, good suggestion.

  3. PCGR is currently not considering expression data - so this is currently not that relevant. Then again, if expression data was provided for a tumor, how would you determine that a gene was overexpressed? I.e. what threshold to use? And how would you know that the level of overexpression in your tumor would be sufficient for the level of overexpression indicated by the biomarker? These matters are very poorly defined; and potentially also the reason for why DNA biomarkers are much more in use currently.

Thanks again for your interest in these matters!

kind regards,
Sigve

@user-tq
Copy link
Author

user-tq commented Jan 10, 2023

thanks,I learned a lot again

@sigven
Copy link
Owner

sigven commented Jan 10, 2023

Hey @user-tq,

Was looking a bit more into MET exon 14 skipping mutations. It seems that variants (SNVs) located at the canonical splice site on one end (of the adjacent intron) are attributed to exon skipping (NM_000245.2:c.3028+1G and NM_000245.2:c.3028+2T):

https://brb.nci.nih.gov/cgi-bin/splicing/splicing_main.cgi

This knowledge should be curated and shown in PCGR. Thanks for input!

best,
Sigve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants