Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENSEMBL transcripts not versioned #233

Open
ahwagner opened this issue Jun 9, 2021 · 7 comments
Open

ENSEMBL transcripts not versioned #233

ahwagner opened this issue Jun 9, 2021 · 7 comments
Labels
keep alive exempt issue from staleness checks

Comments

@ahwagner
Copy link
Member

ahwagner commented Jun 9, 2021

When trying to refer to ensembl transcripts we cannot find by version in the 20210129 data release.

@reece
Copy link
Member

reece commented Jun 13, 2021

This issue touches on several problems:

  • The Ensembl data are extremely old.
  • At the time of last loading, Ensembl didn't expose transcript versions. They eventually did so.
  • It's extremely hard to get exon structures out of Ensembl.

The only official way to get exon coordinates out of Ensembl is to use the perl API. Unfortunately, when I last tried in May 2016, I discovered that Ensembl and bioperl didn't work on a modern distribution.

In ensembl-dev:

Hi Reece,

Yes, you are right. It's the perl version that's not compatible with BioPerl and even the Ensembl API. The latest perl version that's been safely tested with Ensembl API is 5.14.

Thanks,
Harpreet

My recollection is that perl 5.14 was significantly out of date at the time, and that installing manually had knock-on effects with dependent modules. I gave up.

So, to solve this issue, we need a reliable way to get versioned transcripts out of Ensembl.

@ahwagner
Copy link
Member Author

Would we be able to use the .gff files for this purpose, e.g. http://ftp.ensembl.org/pub/release-104/gff3/homo_sapiens/Homo_sapiens.GRCh38.104.chr_patch_hapl_scaff.gff3.gz? It appears that they have gene/transcript/exon IDs with versions, and earlier releases are also maintained, e.g. release 101: http://ftp.ensembl.org/pub/release-101/gff3/homo_sapiens/Homo_sapiens.GRCh38.101.chr_patch_hapl_scaff.gff3.gz

@reece
Copy link
Member

reece commented Jun 14, 2021

Yes, those should be usable in principle, but no work has actually gone into that yet.

@davmlaw
Copy link

davmlaw commented Feb 3, 2022

Hi, I've made cdot - data provider that includes Ensembl transcripts - see HGVS issue

The GTF parsing code etc is all under MIT if you want to re-use this in UTA, an alternative would be to use the JSON and convert that to SQL (or the data provider)

Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Nov 29, 2023
@ahwagner ahwagner added keep alive exempt issue from staleness checks and removed stale Issue is stale and subject to automatic closing labels Nov 29, 2023
Copy link

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Feb 28, 2024
Copy link

github-actions bot commented Mar 7, 2024

This issue was closed because it has been stalled for 7 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 7, 2024
@jsstevenson jsstevenson reopened this Mar 7, 2024
@jsstevenson jsstevenson removed stale Issue is stale and subject to automatic closing closed-by-stale labels Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keep alive exempt issue from staleness checks
Projects
None yet
Development

No branches or pull requests

4 participants