New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inaccurate orf length in gff3 file #190
Comments
This sounds pretty peculiar. If you can send me the example, I'd be happy
to take a look at it.
bhaas at broadinstitute dot org
best,
B
…On Wed, Nov 1, 2023 at 9:53 AM mollylRivers ***@***.***> wrote:
Hi Brian,
I am using TransDecoder on my de novo assembled transcriptome, which has
been assembled using illumina short-reads and iso-seq long-reads in
rnaSPAdes.
Script: TransDecoder.LongOrfs -t assembled.Transcriptome.fasta -m 100,
TransDecoder.Predict -t assembled.Transcriptome.fasta --single_best_only
When using the gff3 output file from TransDecoder I am finding an issue
with the annotated length of the selected open reading frame. There is a
discrepancy between the annotated sequence length (in the gff3 file) and
the actual sequence length. The sequence is labelled as being longer than
the actual contig (which I have manually checked using the longest.orfs.cds
file). What I am finding, is that the selected orf is being annotated with
the length of the longest orf, even when this is not the one that was
selected.
I am not sure if this has to do with the iso-seq long-read sequences in
the transcriptome, but it is causing issues with another programme I need
to run with the TransDecoder gff3 output file. Any input you can provide on
what might be causing this issue and how to fix this would be greatly
appreciated.
Many thanks,
Molly Rivers
—
Reply to this email directly, view it on GitHub
<#190>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZRKXZTVLPPYSHDECTHV43YCJH57AVCNFSM6AAAAAA6ZKKAF2VHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3TENBSGU4DONY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Hi Brian, Thanks so much for your quick response. I have attached the transdecoder produced gff3 file, a sample of the longest_orfs cds file (it was far too big for me to attach the whole thing), along with a txt file showing the error message produced by cellranger which explains the discrepancy in read length I was explaining. Many thanks, transdecoder_produced_gff3_file.gff3.gz |
Hi Molly,
I think the issue here has to do with the expected formatting, as
cellranger wants a gtf rather than a gff3 file.
You can try converting the gff3 file you have to gtf format using the
script included with TransDecoder like so:
TransDecoder/util/gff3_gene_to_gtf_format.pl
Trinity.fasta.transdecoder.gff3 Trinity.fasta. > transdecoder.gtf
and then try running cellranger with the transdecoder.gtf file
Let's see if that works.
best,
B
On Thu, Nov 2, 2023 at 5:45 AM mollylRivers ***@***.***> wrote:
Hi Brian,
Thanks so much for your quick response. I have attached the transdecoder
produced gff3 file, a sample of the longest_orfs cds file (it was far too
big for me to attach the whole thing), along with a txt file showing the
error message produced by cellranger which explains the discrepancy in read
length I was explaining.
Many thanks,
Molly
transdecoder_produced_gff3_file.gff3.gz
cellranger_error_message.txt.gz
longest_orfs.txt
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID:
***@***.***>
…--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas
|
Hi Brian, Sorry, I should have said that I have already reformatted the gff3 file to a gtf, and used this file for cellranger. The error message I attached, is produced when using the gtf file. Many thanks, |
I see. When running cellranger, are you giving the original Trinity.fasta
file as input along with that gtf file? It's complaining about the
sequence length, but the sequence length for that Trinity contig should be
~1.5kb, which can be verified from the fasta file, right? The longest_orfs
file will have the shorter one corresponding to just the cds sequence, and
that wouldn't match up with the gtf or gff3.
best,
B
…On Thu, Nov 2, 2023 at 9:50 AM mollylRivers ***@***.***> wrote:
Hi Brian,
Sorry, I should have said that I have already reformatted the gff3 file to
a gtf, and used this file for cellranger. The error message I attached, is
produced when using the gtf file.
Many thanks,
Molly
—
Reply to this email directly, view it on GitHub
<#190 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZRKX7T7CWV5DYEHQYWGQTYCOQK5AVCNFSM6AAAAAA6ZKKAF2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJQG43TMNZVGE>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Hi Brian, thanks so much for the information. We were using the TransDecoded transcriptome for cellranger, and this does in fact seem to be the issue. Thanks you for your help :) |
Hi Brian,
I am using TransDecoder on my de novo assembled transcriptome, which has been assembled using illumina short-reads and iso-seq long-reads in rnaSPAdes.
Script: TransDecoder.LongOrfs -t assembled.Transcriptome.fasta -m 100, TransDecoder.Predict -t assembled.Transcriptome.fasta --single_best_only
When using the gff3 output file from TransDecoder I am finding an issue with the annotated length of the selected open reading frame. There is a discrepancy between the annotated sequence length (in the gff3 file) and the actual sequence length. The sequence is labelled as being longer than the actual contig (which I have manually checked using the longest.orfs.cds file). What I am finding, is that the selected orf is being annotated with the length of the longest orf, even when this is not the one that was selected.
I am not sure if this has to do with the iso-seq long-read sequences in the transcriptome, but it is causing issues with another programme I need to run with the TransDecoder gff3 output file. Any input you can provide on what might be causing this issue and how to fix this would be greatly appreciated.
Many thanks,
Molly Rivers
The text was updated successfully, but these errors were encountered: