New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature issue in a record after cdna_alignment_orf_to_genome_orf.pl #196
Comments
To clarify my issue, I transformed the last GFF output stringtie.transdecoder.annotated.gff (cdna_alignment_orf_to_genome_orf.pl), to a GTF file using gffread:
That is the reason why I think there is a problem with that specific transcript. I don't know if it's an issue from cdna_alignment_orf_to_genome_orf.pl output, or from gffread itself. I want your help to maybe discard any issue from the Transdecoder side. Best, |
Hi,
The script:
TransDecoder/util/gff3_gene_to_gtf_format.pl
might work for you in converting the gff3 to gtf format at that step.
best,
Brian
On Thu, Feb 22, 2024 at 3:48 AM Salvador Gonzalez Juarez < ***@***.***> wrote:
To clarify my issue, I transformed the last GFF output
stringtie.transdecoder.annotated.gff (cdna_alignment_orf_to_genome_orf.pl),
to a GTF file using gffread: gffread stringtie.transdecoder.annotated.gff
-F -T -v -C -o stringtie.transdecoder.annotated.gtf. But I got the
following warning:
Warning: exon feature found before transcript ID AMEX231106C057529.2.p1
Warning: adjusted transcript AMEX231106C057529.2.p1 boundaries according
to terminal exons.
That is the reason why I think there is a problem with that specific
transcript. I don't know if it's an issue from
cdna_alignment_orf_to_genome_orf.pl output, or from gffread itself. I want
your help to maybe discard any issue from the Transdecoder side.
Best,
Salvador
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID:
***@***.***>
…--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas
|
Hi @brianjohnhaas , the script worked perfectly! |
Dear @brianjohnhaas I need to reopen this issue because I found an important issue with Here is the annotated GFF file after TransDecoder.Predict (using the --single_best_only parameter):
As you can see, those 8 "genes" are actually mRNA isoforms from the same gene. Almost all of them are complete, but the last one (8th) is 5prime_partial. We can confirm this by the sequences in the FASTA file:
Then I executed
Maybe I'm not using the script as I should. Could you tell me if I'm using these files properly in the command: cdna_orfs.genes.gff3, cdna_genome.alignments.gff3 and cdna.fasta . Is also important to remark that I'm working currently with version 5.7.0 . I will test again with 5.7.1 , but in the documentation of the release this issue is not listed. Best, |
Hello @brianjohnhaas,
I was trying to annotate a StringTie transcriptome assembly, following this pipeline: gtf_to_alignment_gff3.pl -> TransDecoder.LongOrfs -> TransDecoder.Predict -> cdna_alignment_orf_to_genome_orf.pl
Unfortunately, there were some issues with one transcript regarding the feature field in the output GFF3 of cdna_alignment_orf_to_genome_orf.pl. The transcript ID is "AMEX231106C057529.2p1", and the transcript is one of the isoforms from the gene "AMEX231106C057529". At the end of the gff3 record for that specific transcript there are two exons without CDS, and three consecutive three_prime_UTR. No other isoform and no other transcript in other genes from my assembly has such problem. Here is the development of the pipeline that I followed:
As you can see in the last lines from the block above, there are two exons that are not followed by a CDS entry, and after them three last three_prime_UTR entries. I have no idea how to debug it (there was no warning in the log from cdna_alignment_orf_to_genome_orf.pl), or what could be the reason why it happened just with that transcript. I'll appreciate your help.
Best,
Salvador
The text was updated successfully, but these errors were encountered: