Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uta_20150827 is missing ENSP accessions and sequences #194

Open
reece opened this issue Sep 16, 2015 · 2 comments
Open

uta_20150827 is missing ENSP accessions and sequences #194

reece opened this issue Sep 16, 2015 · 2 comments
Labels
bug Something isn't working
Milestone

Comments

@reece
Copy link
Member

reece commented Sep 16, 2015

Originally reported by Reece Hart (Bitbucket: reece, GitHub: reece) in biocommons/uta #194
Migrated by bitbucket-issue-migration on 2016-09-09 15:15:07


uta_20150827 does not include ENSP sequences or seqinfo. One consequence of this is that c_to_p transformations in hgvs result in MD5 accessions.

This issue should update uta with ENSP sequences and accessions (from release-79).

FWIW, this occurs because it was discovered that Ensembl sequence accessions are non-unique, as provided via fasta files on their web site. That is, a single accession may be associated with more than one sequence. Roughly 10,000 instances of ambiguous ENSPs exist between e-71 and e-81.

(It's likely that these ambiguities are distinguished by stable_id versions internally, but these distinctions are not exposed in the fasta files.)

@reece
Copy link
Member Author

reece commented Sep 16, 2015

Original comment on Bitbucket by Reece Hart (Bitbucket: reece, GitHub: reece):


@PeteCauseyFreeman Consider watching this issue.

@reece
Copy link
Member Author

reece commented Sep 16, 2015

Original comment on Bitbucket by Reece Hart (Bitbucket: reece, GitHub: reece):


ftp.ensembl.org doesn't provide GRCh37 fasta downloads. That means that the only source for sequences is from the API. Fetching now.

@reece reece added major bug Something isn't working labels Sep 9, 2016
@reece reece added this to the 0.2.x milestone Sep 9, 2016
@reece reece added this to To do in biocommons roadmap Aug 15, 2018
@reece reece removed this from To do in biocommons roadmap May 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant