Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a new NCBI data source to get complete gene summary from ASN dump #130

Open
newgene opened this issue Sep 4, 2022 · 0 comments
Open
Assignees

Comments

@newgene
Copy link
Member

newgene commented Sep 4, 2022

The current gene summary data (summary field) from MyGene.info API are extracted from the RefSeq records (see the current refseq data source).

It appears that Refseq does not contain all gene summary text available from NCBI. For example, reported in #129, gene POLA2 contains a summary text which is not available from its RefSeq record, therefore it's missing from the current MyGene.info API.

As suggested by the NCBI support team (Case #: CAS-941135-X3W9H8 for the record), the complete gene summary text are available from NCBI's ASN1 binary dump files. We can create a new ncbi_gene data source based on ASN1 binary dump files to extract gene summary text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants