Feature issue in a record after cdna_alignment_orf_to_genome_orf.pl #196

SalvadorGJ · 2024-02-21T17:51:02Z

I was trying to annotate a StringTie transcriptome assembly, following this pipeline: gtf_to_alignment_gff3.pl -> TransDecoder.LongOrfs -> TransDecoder.Predict -> cdna_alignment_orf_to_genome_orf.pl

Unfortunately, there were some issues with one transcript regarding the feature field in the output GFF3 of cdna_alignment_orf_to_genome_orf.pl. The transcript ID is "AMEX231106C057529.2p1", and the transcript is one of the isoforms from the gene "AMEX231106C057529". At the end of the gff3 record for that specific transcript there are two exons without CDS, and three consecutive three_prime_UTR. No other isoform and no other transcript in other genes from my assembly has such problem. Here is the development of the pipeline that I followed:

First I executed:

gtf_to_alignment_gff3.pl stringtie.merge.processed.gtf > AmexT_v231106C_FullOnly.stringtie.merge.raw.gff3

"AMEX231106C057529.2" entries in the input stringtie.merge.processed.gtf are:

ptg000056l	StringTie	transcript	24490808	32418568	1000	+	.	gene_id "AMEX231106C057529"; transcript_id "AMEX231106C057529.2"; 
ptg000056l	StringTie	exon	24490808	24491053	1000	+	.	gene_id "AMEX231106C057529"; transcript_id "AMEX231106C057529.2"; exon_number "1"; 
ptg000056l	StringTie	exon	25718856	25718950	1000	+	.	gene_id "AMEX231106C057529"; transcript_id "AMEX231106C057529.2"; exon_number "2"; 
ptg000056l	StringTie	exon	26892333	26892549	1000	+	.	gene_id "AMEX231106C057529"; transcript_id "AMEX231106C057529.2"; exon_number "3"; 
ptg000056l	StringTie	exon	28209504	28209543	1000	+	.	gene_id "AMEX231106C057529"; transcript_id "AMEX231106C057529.2"; exon_number "4"; 
ptg000056l	StringTie	exon	29692485	29692574	1000	+	.	gene_id "AMEX231106C057529"; transcript_id "AMEX231106C057529.2"; exon_number "5"; 
ptg000056l	StringTie	exon	29977070	29977184	1000	+	.	gene_id "AMEX231106C057529"; transcript_id "AMEX231106C057529.2"; exon_number "6"; 
ptg000056l	StringTie	exon	30279073	30279184	1000	+	.	gene_id "AMEX231106C057529"; transcript_id "AMEX231106C057529.2"; exon_number "7"; 
ptg000056l	StringTie	exon	31138727	31138879	1000	+	.	gene_id "AMEX231106C057529"; transcript_id "AMEX231106C057529.2"; exon_number "8"; 
ptg000056l	StringTie	exon	31484095	31484257	1000	+	.	gene_id "AMEX231106C057529"; transcript_id "AMEX231106C057529.2"; exon_number "9"; 
ptg000056l	StringTie	exon	32226903	32227002	1000	+	.	gene_id "AMEX231106C057529"; transcript_id "AMEX231106C057529.2"; exon_number "10"; 
ptg000056l	StringTie	exon	32227351	32227454	1000	+	.	gene_id "AMEX231106C057529"; transcript_id "AMEX231106C057529.2"; exon_number "11"; 
ptg000056l	StringTie	exon	32415663	32415698	1000	+	.	gene_id "AMEX231106C057529"; transcript_id "AMEX231106C057529.2"; exon_number "12"; 
ptg000056l	StringTie	exon	32418037	32418568	1000	+	.	gene_id "AMEX231106C057529"; transcript_id "AMEX231106C057529.2"; exon_number "13";

"AMEX231106C057529.2" entries in the output AmexT_v231106C_FullOnly.stringtie.merge.raw.gff3 are:

ptg000056l	Cufflinks	match	24490808	24491053	100	+	.	ID=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2;Target=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2 1 246 +
ptg000056l	Cufflinks	match	25718856	25718950	100	+	.	ID=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2;Target=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2 247 341 +
ptg000056l	Cufflinks	match	26892333	26892549	100	+	.	ID=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2;Target=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2 342 558 +
ptg000056l	Cufflinks	match	28209504	28209543	100	+	.	ID=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2;Target=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2 559 598 +
ptg000056l	Cufflinks	match	29692485	29692574	100	+	.	ID=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2;Target=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2 599 688 +
ptg000056l	Cufflinks	match	29977070	29977184	100	+	.	ID=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2;Target=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2 689 803 +
ptg000056l	Cufflinks	match	30279073	30279184	100	+	.	ID=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2;Target=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2 804 915 +
ptg000056l	Cufflinks	match	31138727	31138879	100	+	.	ID=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2;Target=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2 916 1068 +
ptg000056l	Cufflinks	match	31484095	31484257	100	+	.	ID=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2;Target=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2 1069 1231 +
ptg000056l	Cufflinks	match	32226903	32227002	100	+	.	ID=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2;Target=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2 1232 1331 +
ptg000056l	Cufflinks	match	32227351	32227454	100	+	.	ID=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2;Target=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2 1332 1435 +
ptg000056l	Cufflinks	match	32415663	32415698	100	+	.	ID=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2;Target=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2 1436 1471 +
ptg000056l	Cufflinks	match	32418037	32418568	100	+	.	ID=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2;Target=GENE^AMEX231106C057529,TRANS^AMEX231106C057529.2 1472 2003 +

Then I extract the FASTA sequence from each transcript, which were stored in AmexT_v231106C_FullOnly.stringtie.merge.raw.fasta. There is only one sequence with the ID "AMEX231106C057529.2" in the FASTA file. Then I resume the pipeline:

TransDecoder.LongOrfs -t AmexT_v231106C_FullOnly.stringtie.merge.raw.fasta -S -m 30
TransDecoder.Predict -t AmexT_v231106C_FullOnly.stringtie.merge.raw.fasta --single_best_only

"AMEX231106C057529.2" entries in the output AmexT_v231106C_FullOnly.stringtie.merge.raw.fasta.transdecoder.gff3

AMEX231106C057529.2	transdecoder	gene	1	2003	.	+	.	ID=GENE.AMEX231106C057529.2~~AMEX231106C057529.2.p1;Name="ORF type:5prime_partial (+),score=76.51"
AMEX231106C057529.2	transdecoder	mRNA	1	2003	.	+	.	ID=AMEX231106C057529.2.p1;Parent=GENE.AMEX231106C057529.2~~AMEX231106C057529.2.p1;Name="ORF type:5prime_partial (+),score=76.51"
AMEX231106C057529.2	transdecoder	exon	1	2003	.	+	.	ID=AMEX231106C057529.2.p1.exon1;Parent=AMEX231106C057529.2.p1
AMEX231106C057529.2	transdecoder	CDS	1	1401	.	+	0	ID=cds.AMEX231106C057529.2.p1;Parent=AMEX231106C057529.2.p1
AMEX231106C057529.2	transdecoder	three_prime_UTR	1402	2003	.	+	.	ID=AMEX231106C057529.2.p1.utr3p1;Parent=AMEX231106C057529.2.p1

Finally I tried to merge the alignment information to the prediction results:

cdna_alignment_orf_to_genome_orf.pl AmexT_v231106C_FullOnly.stringtie.merge.raw.fasta.transdecoder.gff3 AmexT_v231106C_FullOnly.stringtie.merge.raw.gff3 AmexT_v231106C_FullOnly.stringtie.merge.raw.fasta > stringtie.transdecoder.annotated.gff

"AMEX231106C057529.2" entries in the output stringtie.transdecoder.annotated.gff:

ptg000056l	transdecoder	mRNA	24490808	32418568	.	+	.	ID=AMEX231106C057529.2.p1;Parent=AMEX231106C057529^ptg000056l^+;Name="ORF type:3prime_partial (+),score=41.53"
ptg000056l	transdecoder	exon	24490808	24491053	.	+	.	ID=AMEX231106C057529.2.p1.exon1;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	CDS	24490808	24491053	.	+	0	ID=cds.AMEX231106C057529.2.p1;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	exon	25718856	25718950	.	+	.	ID=AMEX231106C057529.2.p1.exon2;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	CDS	25718856	25718950	.	+	0	ID=cds.AMEX231106C057529.2.p1;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	exon	26892333	26892549	.	+	.	ID=AMEX231106C057529.2.p1.exon3;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	CDS	26892333	26892549	.	+	1	ID=cds.AMEX231106C057529.2.p1;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	exon	28209504	28209543	.	+	.	ID=AMEX231106C057529.2.p1.exon4;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	CDS	28209504	28209543	.	+	0	ID=cds.AMEX231106C057529.2.p1;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	exon	29692485	29692574	.	+	.	ID=AMEX231106C057529.2.p1.exon5;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	CDS	29692485	29692574	.	+	2	ID=cds.AMEX231106C057529.2.p1;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	exon	29977070	29977184	.	+	.	ID=AMEX231106C057529.2.p1.exon6;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	CDS	29977070	29977184	.	+	2	ID=cds.AMEX231106C057529.2.p1;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	exon	30279073	30279184	.	+	.	ID=AMEX231106C057529.2.p1.exon7;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	CDS	30279073	30279184	.	+	1	ID=cds.AMEX231106C057529.2.p1;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	exon	31138727	31138879	.	+	.	ID=AMEX231106C057529.2.p1.exon8;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	CDS	31138727	31138879	.	+	0	ID=cds.AMEX231106C057529.2.p1;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	exon	31484095	31484257	.	+	.	ID=AMEX231106C057529.2.p1.exon9;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	CDS	31484095	31484257	.	+	0	ID=cds.AMEX231106C057529.2.p1;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	exon	32226903	32227002	.	+	.	ID=AMEX231106C057529.2.p1.exon10;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	CDS	32226903	32227002	.	+	2	ID=cds.AMEX231106C057529.2.p1;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	exon	32227351	32227454	.	+	.	ID=AMEX231106C057529.2.p1.exon11;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	CDS	32227351	32227420	.	+	1	ID=cds.AMEX231106C057529.2.p1;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	exon	32415663	32415698	.	+	.	ID=AMEX231106C057529.2.p1.exon12;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	exon	32418037	32418568	.	+	.	ID=AMEX231106C057529.2.p1.exon13;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	three_prime_UTR	32227421	32227454	.	+	.	ID=AMEX231106C057529.2.p1.utr3p1;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	three_prime_UTR	32415663	32415698	.	+	.	ID=AMEX231106C057529.2.p1.utr3p2;Parent=AMEX231106C057529.2.p1
ptg000056l	transdecoder	three_prime_UTR	32418037	32418568	.	+	.	ID=AMEX231106C057529.2.p1.utr3p3;Parent=AMEX231106C057529.2.p1

As you can see in the last lines from the block above, there are two exons that are not followed by a CDS entry, and after them three last three_prime_UTR entries. I have no idea how to debug it (there was no warning in the log from cdna_alignment_orf_to_genome_orf.pl), or what could be the reason why it happened just with that transcript. I'll appreciate your help.

Best,
Salvador

The text was updated successfully, but these errors were encountered:

SalvadorGJ · 2024-02-22T08:48:11Z

To clarify my issue, I transformed the last GFF output stringtie.transdecoder.annotated.gff (cdna_alignment_orf_to_genome_orf.pl), to a GTF file using gffread: gffread stringtie.transdecoder.annotated.gff -F -T -v -C -o stringtie.transdecoder.annotated.gtf. But I got the following warning:

Warning: exon feature found before transcript ID AMEX231106C057529.2.p1
Warning: adjusted transcript AMEX231106C057529.2.p1 boundaries according to terminal exons.

That is the reason why I think there is a problem with that specific transcript. I don't know if it's an issue from cdna_alignment_orf_to_genome_orf.pl output, or from gffread itself. I want your help to maybe discard any issue from the Transdecoder side.

Best,
Salvador

brianjohnhaas · 2024-02-22T11:29:57Z

Hi, The script: TransDecoder/util/gff3_gene_to_gtf_format.pl might work for you in converting the gff3 to gtf format at that step. best, Brian

On Thu, Feb 22, 2024 at 3:48 AM Salvador Gonzalez Juarez < ***@***.***> wrote: To clarify my issue, I transformed the last GFF output

stringtie.transdecoder.annotated.gff (cdna_alignment_orf_to_genome_orf.pl), to a GTF file using gffread: gffread stringtie.transdecoder.annotated.gff -F -T -v -C -o stringtie.transdecoder.annotated.gtf. But I got the following warning:

Warning: exon feature found before transcript ID AMEX231106C057529.2.p1 Warning: adjusted transcript AMEX231106C057529.2.p1 boundaries according

to terminal exons.

That is the reason why I think there is a problem with that specific

transcript. I don't know if it's an issue from cdna_alignment_orf_to_genome_orf.pl output, or from gffread itself. I want your help to maybe discard any issue from the Transdecoder side.

Best, Salvador — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID:

***@***.***>

…

-- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas

SalvadorGJ · 2024-02-29T11:08:04Z

Hi @brianjohnhaas , the script worked perfectly!
Thank you so much.

SalvadorGJ · 2024-04-12T14:56:48Z

Dear @brianjohnhaas

I need to reopen this issue because I found an important issue with cdna_alignment_orf_to_genome_orf.pl that is worth to be checked. I tried to merge the Transdecoder gff annotation and the raw StingTie gff, but the ORF name and score for the parent gene is now inherited to the mRNA. So, if one mRNA isoform has a partial ORF, but the gene was annotated as complete ORF because it has another isoform mRNA that has indeed a complete ORF, then the partial isoform will have the complete tag! I will show you an example.

Here is the annotated GFF file after TransDecoder.Predict (using the --single_best_only parameter):

gene AMEX231106C000001.1:1-2523 ID=GENE.AMEX231106C000001.1~~AMEX231106C000001.1.p1;Name="ORF type:complete (+),score=75.05"
mRNA AMEX231106C000001.1:1-2523 ID=AMEX231106C000001.1.p1;Parent=GENE.AMEX231106C000001.1~~AMEX231106C000001.1.p1;Name="ORF type:complete (+),score=75.05"
gene AMEX231106C000001.2:1-1974 ID=GENE.AMEX231106C000001.2~~AMEX231106C000001.2.p1;Name="ORF type:complete (+),score=59.74"
mRNA AMEX231106C000001.2:1-1974 ID=AMEX231106C000001.2.p1;Parent=GENE.AMEX231106C000001.2~~AMEX231106C000001.2.p1;Name="ORF type:complete (+),score=59.74"
gene AMEX231106C000001.3:1-1892 ID=GENE.AMEX231106C000001.3~~AMEX231106C000001.3.p1;Name="ORF type:complete (+),score=49.25"
mRNA AMEX231106C000001.3:1-1892 ID=AMEX231106C000001.3.p1;Parent=GENE.AMEX231106C000001.3~~AMEX231106C000001.3.p1;Name="ORF type:complete (+),score=49.25"
gene AMEX231106C000001.4:1-2359 ID=GENE.AMEX231106C000001.4~~AMEX231106C000001.4.p1;Name="ORF type:complete (+),score=75.05"
mRNA AMEX231106C000001.4:1-2359 ID=AMEX231106C000001.4.p1;Parent=GENE.AMEX231106C000001.4~~AMEX231106C000001.4.p1;Name="ORF type:complete (+),score=75.05"
gene AMEX231106C000001.5:1-642 ID=GENE.AMEX231106C000001.5~~AMEX231106C000001.5.p1;Name="ORF type:complete (+),score=3.84"
mRNA AMEX231106C000001.5:1-642 ID=AMEX231106C000001.5.p1;Parent=GENE.AMEX231106C000001.5~~AMEX231106C000001.5.p1;Name="ORF type:complete (+),score=3.84"
gene AMEX231106C000001.6:1-1689 ID=GENE.AMEX231106C000001.6~~AMEX231106C000001.6.p1;Name="ORF type:complete (+),score=70.68"
mRNA AMEX231106C000001.6:1-1689 ID=AMEX231106C000001.6.p1;Parent=GENE.AMEX231106C000001.6~~AMEX231106C000001.6.p1;Name="ORF type:complete (+),score=70.68"
gene AMEX231106C000001.7:1-1626 ID=GENE.AMEX231106C000001.7~~AMEX231106C000001.7.p1;Name="ORF type:complete (+),score=55.43"
mRNA AMEX231106C000001.7:1-1626 ID=AMEX231106C000001.7.p1;Parent=GENE.AMEX231106C000001.7~~AMEX231106C000001.7.p1;Name="ORF type:complete (+),score=55.43"
gene AMEX231106C000001.8:1-264 ID=GENE.AMEX231106C000001.8~~AMEX231106C000001.8.p1;Name="ORF type:5prime_partial (+),score=4.64"
mRNA AMEX231106C000001.8:1-264 ID=AMEX231106C000001.8.p1;Parent=GENE.AMEX231106C000001.8~~AMEX231106C000001.8.p1;Name="ORF type:5prime_partial (+),score=4.64"

As you can see, those 8 "genes" are actually mRNA isoforms from the same gene. Almost all of them are complete, but the last one (8th) is 5prime_partial. We can confirm this by the sequences in the FASTA file:

>AMEX231106C000001.1.p1 GENE.AMEX231106C000001.1~~AMEX231106C000001.1.p1  ORF type:complete (+),score=75.05 len:425 AMEX231106C000001.1:560-1837(+)
MLRRKLRYCQLKRCKLLLLVVLTLLTLSAVKIHQHAALMNHRQLLIRDYAPGSSVDCVSI
LRGDQEAIALAKLETLKVSFRNRPRLATQDYVNMTKDCESFTKSRKYILQPLSKEEALFP
IAYSIVVHHKIDMFETLLRTIYAPQNFYCIHVDKKAPESFLAAVKGIVSCFGNVFLASQL
ESVIYASWSRVQADINCMKDLHRRSAKWKYLINLCGMDFPTKTNLEMVEKLKALKGENSL
ETEKMPPNKEWRWRKHHEVVDGKVRTTEVDKEPPPFGMTVLSGSAYFVVSRPFVEYVLEN
EKILTFIEWAKDTYSPDEYLWATIQRFPETPGFLPTNEKYDVSDMNSVARFVMWHYFEGD
VSKGAPYPPCSGAHVRSICVFGAGDLRWMLRTHHLFANKFDSDVDPFAIQCLEEYLRDKA
LYQHA*
>AMEX231106C000001.2.p1 GENE.AMEX231106C000001.2~~AMEX231106C000001.2.p1  ORF type:complete (+),score=59.74 len:411 AMEX231106C000001.2:482-1717(+)
MLRRKLRYCQLKRCKLLLLVVLTLLTLSAVKIHQHAALMNHRQLLIRDYAPGSSVDCVSI
LRGDQEAIALAKLETLKVSFRNRPRLATQDYVNMTKDCESFTKSRKYILQPLSKEEALFP
IAYSIVVHHKIDMFETLLRTIYAPQNFYCIHVDKKAPESFLAAVKGIVSCFGNVFLASQL
ESVIYASWSRVQADINCMKDLHRRSAKWKYLINLCGMDFPTKTNLEMVEKLKALKGENSL
ETEKMPPNKEWRWRKHHEVVDGKVRTTEVDKEPPPFGMTVLSGSAYFVVSRPFVEYVLEN
EKILTFIEWAKDTYSPDEYLWATIQRFPETPGFLPTNEKYDVSDMNSVPQLKKARVNLLR
PAHFVILILQYLVLKLLRRSQLRCCPKARNQIFECNVLLRRVVVGCLGGGV*
>AMEX231106C000001.3.p1 GENE.AMEX231106C000001.3~~AMEX231106C000001.3.p1  ORF type:complete (+),score=49.25 len:297 AMEX231106C000001.3:482-1375(+)
MLRRKLRYCQLKRCKLLLLVVLTLLTLSAVKIHQHAALMNHRQLLIRDYAPGSSVDCVSI
LRGDQEAIALAKLETLKVSFRNRPRLATQDYVNMTKDCESFTKSRKYILQPLSKEEALFP
IAYSIVVHHKIDMFETLLRTIYAPQNFYCIHVDKKAPESFLAAVKGIVSCFGNVFLASQL
ESVIYASWSRVQADINCMKDLHRRSAKWKYLINLCGMDFPTKTNLEMVEKLKALKGENSL
ETEKMPPNKEWRWRKHHEVVDGKVRTTEVDKEPPPFGISHSRHGCHGDSDGAQRHMW*
>AMEX231106C000001.4.p1 GENE.AMEX231106C000001.4~~AMEX231106C000001.4.p1  ORF type:complete (+),score=75.05 len:425 AMEX231106C000001.4:683-1960(+)
MLRRKLRYCQLKRCKLLLLVVLTLLTLSAVKIHQHAALMNHRQLLIRDYAPGSSVDCVSI
LRGDQEAIALAKLETLKVSFRNRPRLATQDYVNMTKDCESFTKSRKYILQPLSKEEALFP
IAYSIVVHHKIDMFETLLRTIYAPQNFYCIHVDKKAPESFLAAVKGIVSCFGNVFLASQL
ESVIYASWSRVQADINCMKDLHRRSAKWKYLINLCGMDFPTKTNLEMVEKLKALKGENSL
ETEKMPPNKEWRWRKHHEVVDGKVRTTEVDKEPPPFGMTVLSGSAYFVVSRPFVEYVLEN
EKILTFIEWAKDTYSPDEYLWATIQRFPETPGFLPTNEKYDVSDMNSVARFVMWHYFEGD
VSKGAPYPPCSGAHVRSICVFGAGDLRWMLRTHHLFANKFDSDVDPFAIQCLEEYLRDKA
LYQHA*
>AMEX231106C000001.5.p1 GENE.AMEX231106C000001.5~~AMEX231106C000001.5.p1  ORF type:complete (+),score=3.84 len:62 AMEX231106C000001.5:386-574(+)
MPSRSGQLLTELRDLSQLLLKILEMYIHAHQNLEFEPCVRAQDCCVRARLLCARKTVVCA
RR*
>AMEX231106C000001.6.p1 GENE.AMEX231106C000001.6~~AMEX231106C000001.6.p1  ORF type:complete (+),score=70.68 len:365 AMEX231106C000001.6:335-1432(+)
MLRRKLRYCQLKRCKLLLLVVLTLLTLSAVKIHQHAALMNHRQLLIRDYAPGSSVDCVSI
LRGDQEAIALAKLETLKVSFRNRPRLATQDYVNMTKDCESFTKSRKYILQPLSKEEALFP
IAYSIVVHHKIDMFETLLRTIYAPQNFYCIHVDKKAPESFLAAVKGIVSCFGNVFLASQL
ESVIYASWSRVQADINCMKDLHRRSAKWKYLINLCGMDFPTKTNLEMVEKLKALKGENSL
ETEKMPPNKEWRWRKHHEVVDGKVRTTEVDKEPPPFGMTVLSGSAYFVVSRPFVEYVLEN
EKILTFIEWAKDTYSPDEYLWATIQRFPETPGFLPTNEKYDVSDMNSVARFVMWHYFEGD
VSKGV*
>AMEX231106C000001.7.p1 GENE.AMEX231106C000001.7~~AMEX231106C000001.7.p1  ORF type:complete (+),score=55.43 len:306 AMEX231106C000001.7:140-1060(+)
MLRRKLRYCQLKRCKLLLLVVLTLLTLSAVKIHQHAALMNHRQLLIRDYAPGSSVDCVSI
LRGDQEAIALAKLETLKVSFRNRPRLATQDYVNMTKDCESFTKSRKYILQPLSKEEALFP
IAYSIVVHHKIDMFETLLRTIYAPQNFYCIHVDKKAPESFLAAVKGIVSCFGNVFLASQL
ESVIYASWSRVQADINCMKDLHRRSAKWKYLINLCGMDFPTKTNLEMVEKLKALKGENSL
ETEKMPPNKEWRWRKHHEVVDGKVRTTEVDKEPPPFGMTVLSGSAYFVVSRPFVEYVLEN
EKILTL*
>AMEX231106C000001.8.p1 GENE.AMEX231106C000001.8~~AMEX231106C000001.8.p1  ORF type:5prime_partial (+),score=4.64 len:64 AMEX231106C000001.8:2-196(+)
ETPGFLPTNEKYDVSDMNSLLLKILEMYIHAHQNLEFEPCVRAQDCCVRARLLCARKTVV
CARR*

Then I executed cdna_alignment_orf_to_genome_orf.pl <transdecoder.gff3> <stringtie.gff3> <stringtie_transcripts.fasta> > merged.gff3. If I parse the features, coords and attributes of the same gene in this last GFF3 file I get the following:

gene ptg000001l:930887-1122390 ID=AMEX231106C000001^ptg000001l^+;Name="ORF type:complete (+),score=75.05"
mRNA ptg000001l:930887-1122390 ID=AMEX231106C000001.1.p1;Parent=AMEX231106C000001^ptg000001l^+;Name="ORF type:complete (+),score=75.05"
mRNA ptg000001l:930965-1122390 ID=AMEX231106C000001.2.p1;Parent=AMEX231106C000001^ptg000001l^+;Name="ORF type:complete (+),score=75.05"
mRNA ptg000001l:930965-1122390 ID=AMEX231106C000001.3.p1;Parent=AMEX231106C000001^ptg000001l^+;Name="ORF type:complete (+),score=75.05"
mRNA ptg000001l:930992-1122390 ID=AMEX231106C000001.4.p1;Parent=AMEX231106C000001^ptg000001l^+;Name="ORF type:complete (+),score=75.05"
mRNA ptg000001l:931023-1122390 ID=AMEX231106C000001.5.p1;Parent=AMEX231106C000001^ptg000001l^+;Name="ORF type:complete (+),score=75.05"
mRNA ptg000001l:931112-1122390 ID=AMEX231106C000001.6.p1;Parent=AMEX231106C000001^ptg000001l^+;Name="ORF type:complete (+),score=75.05"
mRNA ptg000001l:1120288-1122390 ID=AMEX231106C000001.7.p1;Parent=AMEX231106C000001^ptg000001l^+;Name="ORF type:complete (+),score=75.05"
mRNA ptg000001l:1121410-1122390 ID=AMEX231106C000001.8.p1;Parent=AMEX231106C000001^ptg000001l^+;Name="ORF type:complete (+),score=75.05"

Maybe I'm not using the script as I should. Could you tell me if I'm using these files properly in the command: cdna_orfs.genes.gff3, cdna_genome.alignments.gff3 and cdna.fasta . Is also important to remark that I'm working currently with version 5.7.0 . I will test again with 5.7.1 , but in the documentation of the release this issue is not listed.

Best,
Salvador

SalvadorGJ closed this as completed Feb 29, 2024

SalvadorGJ reopened this Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature issue in a record after cdna_alignment_orf_to_genome_orf.pl #196

Feature issue in a record after cdna_alignment_orf_to_genome_orf.pl #196

SalvadorGJ commented Feb 21, 2024 •

edited

SalvadorGJ commented Feb 22, 2024

brianjohnhaas commented Feb 22, 2024 via email

SalvadorGJ commented Feb 29, 2024

SalvadorGJ commented Apr 12, 2024

Feature issue in a record after cdna_alignment_orf_to_genome_orf.pl #196

Feature issue in a record after cdna_alignment_orf_to_genome_orf.pl #196

Comments

SalvadorGJ commented Feb 21, 2024 • edited

SalvadorGJ commented Feb 22, 2024

brianjohnhaas commented Feb 22, 2024 via email

SalvadorGJ commented Feb 29, 2024

SalvadorGJ commented Apr 12, 2024

SalvadorGJ commented Feb 21, 2024 •

edited