Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error for Gencode gtf #180

Open
taigalokhid opened this issue Aug 22, 2023 · 3 comments
Open

Error for Gencode gtf #180

taigalokhid opened this issue Aug 22, 2023 · 3 comments

Comments

@taigalokhid
Copy link

Hi Brian,
I try to create a transcripts.fasta for decoy chromosomes and virus chromosomes.
I run
util/gtf_genome_to_cdna_fasta.pl transcripts.gtf test.genome.fasta > transcripts.fasta
And on a Gencode gtf it gave me an Error with duplicated tags
After this I try to realize gff3 -> bed12 -> fasta from bedtools
I try to run gff3_file_to_bed.pl on a Gencode gff3
And I receive
Error, no gene feature found for ENST00000456328.2.... ignoring feature.
Could you clarify, what is the problem with features in Genecode gtf?

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Aug 22, 2023 via email

@taigalokhid
Copy link
Author

taigalokhid commented Aug 22, 2023

Here is an extracted part of all needed lines - all needed features (gene, transcript, exon) are present

chr1 HAVANA gene 11869 14409 . + . gene_id "ENSG00000290825.1"; gene_type "lncRNA"; gene_name "DDX11L2"; level 2; tag "overlaps_pseudogene";
chr1 HAVANA transcript 11869 14409 . + . gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; level 2; transcript_support_level "1"; tag "basic"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 11869 12227 . + . gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; transcript_support_level "1"; tag "basic"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 12613 12721 . + . gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; exon_number 2; exon_id "ENSE00003582793.1"; level 2; transcript_support_level "1"; tag "basic"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 13221 14409 . + . gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; exon_number 3; exon_id "ENSE00002312635.1"; level 2; transcript_support_level "1"; tag "basic"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA gene 12010 13670 . + . gene_id "ENSG00000223972.6"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; level 2; hgnc_id "HGNC:37102"; havana_gene "OTTHUMG00000000961.2";

I grep it - Gene_id exist and in the same format as gtf for stringtie result (https://ccb.jhu.edu/software/stringtie/index.shtml?t=manual)

sharova@tux:/data7a/bio/human_genomics/shared/maps/GDC.h38.d1.vd1/gencodev43$ grep 'ENST00000456328.2' gencode.v43.chr_patch_hapl_scaff.annotation.100.gtf

chr1 HAVANA transcript 11869 14409 . + . gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; level 2; transcript_support_level "1"; tag "basic"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 11869 12227 . + . gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; transcript_support_level "1"; tag "basic"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 12613 12721 . + . gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; exon_number 2; exon_id "ENSE00003582793.1"; level 2; transcript_support_level "1"; tag "basic"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 13221 14409 . + . gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; exon_number 3; exon_id "ENSE00002312635.1"; level 2; transcript_support_level "1"; tag "basic"; havana_transcript "OTTHUMT00000362751.1";

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Aug 23, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants