Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add associated conditions from phenotypes section of Entrez gene #93

Open
malachig opened this issue Oct 29, 2020 · 2 comments
Open

Add associated conditions from phenotypes section of Entrez gene #93

malachig opened this issue Oct 29, 2020 · 2 comments

Comments

@malachig
Copy link

It would be great to be able to pull associated condition information from Entrez via mygene.info.

For example, for BRAF (https://www.ncbi.nlm.nih.gov/gene/673):

Under phenotypes they list conditions from the genetic testing registry such as:
Cardiofaciocutaneous syndrome 1
Dabrafenib response
...
Vemurafenib response

We would really like to pull such information into CIViC along with other critical gene info we already obtain from myvariant.info (e.g. https://civicdb.org/events/genes/5/summary/variants/2826/summary)

@andrewsu
Copy link
Member

I also think this data would be super useful, so I looked into it a bit. Just recording what I found...

Most of our NCBI data comes from https://ftp.ncbi.nlm.nih.gov/gene/DATA/. It looks like those phenotypes come from mim2gene_medgen. If I search for @malachig's example gene ID 673, I get the following records:

$ awk '$2==673' mim2gene_medgen
115150  673     phenotype        GeneMap        CN029449        -
163950  673     phenotype        GeneReviews    C4551602        -
164757  673     gene    -       -       -
211980  673     phenotype        GeneMap        C0684249        -
613706  673     phenotype        GeneMap        C3150970        -
613707  673     phenotype        GeneMap        C3150971        -

It looks like it got five out of the seven phenotypes listed on https://www.ncbi.nlm.nih.gov/gene/673

image

The two MedGen IDs that aren't found for gene 673 are not found anywhere in the mim2gene_medgen file.

$ grep -c  CN239586 mim2gene_medgen.txt
0
$ grep  -c CN239577 mim2gene_medgen.txt
0

Hmm, not sure what the source is for those two missing ones...

@kevinxin90
Copy link
Contributor

Also checked medgen download, doesn't see such a file.

Unless someone could provide a link to the full file, we probably will just go with mim2gene_medgen.txt.

And FYI, we do have a BioThings API ready which can connect from gene -> disease/phenotype, that's the EBIGene2Phenotype API. For example, you can query by HGNC ID to get associated conditions: https://biothings.ncats.io/ebigene2phenotype/gene/1097

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants