Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inaccurate orf length in gff3 file #190

Open
mollylRivers opened this issue Nov 1, 2023 · 6 comments
Open

Inaccurate orf length in gff3 file #190

mollylRivers opened this issue Nov 1, 2023 · 6 comments

Comments

@mollylRivers
Copy link

Hi Brian,

I am using TransDecoder on my de novo assembled transcriptome, which has been assembled using illumina short-reads and iso-seq long-reads in rnaSPAdes.

Script: TransDecoder.LongOrfs -t assembled.Transcriptome.fasta -m 100, TransDecoder.Predict -t assembled.Transcriptome.fasta --single_best_only

When using the gff3 output file from TransDecoder I am finding an issue with the annotated length of the selected open reading frame. There is a discrepancy between the annotated sequence length (in the gff3 file) and the actual sequence length. The sequence is labelled as being longer than the actual contig (which I have manually checked using the longest.orfs.cds file). What I am finding, is that the selected orf is being annotated with the length of the longest orf, even when this is not the one that was selected.

I am not sure if this has to do with the iso-seq long-read sequences in the transcriptome, but it is causing issues with another programme I need to run with the TransDecoder gff3 output file. Any input you can provide on what might be causing this issue and how to fix this would be greatly appreciated.

Many thanks,
Molly Rivers

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Nov 1, 2023 via email

@mollylRivers
Copy link
Author

Hi Brian,

Thanks so much for your quick response. I have attached the transdecoder produced gff3 file, a sample of the longest_orfs cds file (it was far too big for me to attach the whole thing), along with a txt file showing the error message produced by cellranger which explains the discrepancy in read length I was explaining.

Many thanks,
Molly

transdecoder_produced_gff3_file.gff3.gz
cellranger_error_message.txt.gz
longest_orfs.txt

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Nov 2, 2023 via email

@mollylRivers
Copy link
Author

Hi Brian,

Sorry, I should have said that I have already reformatted the gff3 file to a gtf, and used this file for cellranger. The error message I attached, is produced when using the gtf file.

Many thanks,
Molly

@brianjohnhaas
Copy link
Contributor

brianjohnhaas commented Nov 2, 2023 via email

@mollylRivers
Copy link
Author

Hi Brian, thanks so much for the information. We were using the TransDecoded transcriptome for cellranger, and this does in fact seem to be the issue. Thanks you for your help :)
Molly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants