TransDecoder.Predict flag --single_best_only produces .pep file with multiple ORFs per transcript id #176

peterszoevenyi · 2023-07-12T15:55:21Z

Dear Brian,

We are trying to create the best ORF prediction for each transcript in a fasta file.

We run TransDecoder.LongOrfs with the --complete_orfs_only flag.
Then run TransDecoder.Predict with the --single_best_only flag.

Despite using the --single_best_only flag the longest_orfs.pep (25902 fasta header entries) file contains many more peptide predictions than the transcript.fasta (11355 unique header entries) file used. This is because the longest_orfs.pep file reprorts multiple peptide predictions per transcript id named as "trasncriptid1.p1" and "transcriptid1.p2" etc.
Furthermore, if I grep only the headers containing the ".p1" string from the longest_orfs.pep file, I have fewer entries (8029) than in the transcript.fasta file (11355 unique header entries). I guess this means that some proportion of the transcript.fasta transcripts produced partial ORF predictions that were filtered out in the first step of the process, right?

According to my understanding (and the help of the TransDecoder.Predict code) the --single_best_only flag is expect to retain only the single best ORF per transcript id. Nevertheless, the results suggest that this is not the case.

Is this a bug or have we misinterpreted the description of the --single_best_only flag?

We are using /TransDecoder-TransDecoder-v5.7.0/

I would appreciate if you could let me know your thoughts on this issue which we have not been able to resolve.
With kind regards.
Peter

brianjohnhaas · 2023-07-13T13:15:43Z

Hi Peter, Can you tar.gz your working directory including the transdecoder intermediates and inputs and privately share it with me? I'll take a look. bhaas at broadinstitute dot org best, ~brian

…

On Wed, Jul 12, 2023 at 11:55 AM peterszoevenyi ***@***.***> wrote: Dear Brian, We are trying to create the best ORF prediction for each transcript in a fasta file. We run TransDecoder.LongOrfs with the --complete_orfs_only flag. Then run TransDecoder.Predict with the --single_best_only flag. Despite using the --single_best_only flag the longest_orfs.pep (25902 fasta header entries) file contains many more peptide predictions than the transcript.fasta (11355 unique header entries) file used. This is because the longest_orfs.pep file reprorts multiple peptide predictions per transcript id named as "trasncriptid1.p1" and "transcriptid1.p2" etc. Furthermore, if I grep only the headers containing the ".p1" string from the longest_orfs.pep file, I have fewer entries (8029) than in the transcript.fasta file (11355 unique header entries). I guess this means that some proportion of the transcript.fasta transcripts produced partial ORF predictions that were filtered out in the first step of the process, right? According to my understanding (and the help of the TransDecoder.Predict code) the --single_best_only flag is expect to retain only the single best ORF per transcript id. Nevertheless, the results suggest that this is not the case. Is this a bug or have we misinterpreted the description of the --single_best_only flag? We are using /TransDecoder-TransDecoder-v5.7.0/ I would appreciate if you could let me know your thoughts on this issue which we have not been able to resolve. With kind regards. Peter — Reply to this email directly, view it on GitHub <#176>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZRKXZ6S2QYJLL4PYFGU5TXP3CHLANCNFSM6AAAAAA2HWJASA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

-- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>

brianjohnhaas · 2023-07-14T18:52:27Z

Thanks, Peter. I'll look through the files you sent and get back to you shortly. best, ~brian

…

On Thu, Jul 13, 2023 at 9:15 AM Brian Haas ***@***.***> wrote: Hi Peter, Can you tar.gz your working directory including the transdecoder intermediates and inputs and privately share it with me? I'll take a look. bhaas at broadinstitute dot org best, ~brian On Wed, Jul 12, 2023 at 11:55 AM peterszoevenyi ***@***.***> wrote: > Dear Brian, > > We are trying to create the best ORF prediction for each transcript in a > fasta file. > > We run TransDecoder.LongOrfs with the --complete_orfs_only flag. > Then run TransDecoder.Predict with the --single_best_only flag. > > Despite using the --single_best_only flag the longest_orfs.pep (25902 > fasta header entries) file contains many more peptide predictions than the > transcript.fasta (11355 unique header entries) file used. This is because > the longest_orfs.pep file reprorts multiple peptide predictions per > transcript id named as "trasncriptid1.p1" and "transcriptid1.p2" etc. > Furthermore, if I grep only the headers containing the ".p1" string from > the longest_orfs.pep file, I have fewer entries (8029) than in the > transcript.fasta file (11355 unique header entries). I guess this means > that some proportion of the transcript.fasta transcripts produced partial > ORF predictions that were filtered out in the first step of the process, > right? > > According to my understanding (and the help of the TransDecoder.Predict > code) the --single_best_only flag is expect to retain only the single best > ORF per transcript id. Nevertheless, the results suggest that this is not > the case. > > Is this a bug or have we misinterpreted the description of the > --single_best_only flag? > > We are using /TransDecoder-TransDecoder-v5.7.0/ > > I would appreciate if you could let me know your thoughts on this issue > which we have not been able to resolve. > With kind regards. > Peter > > — > Reply to this email directly, view it on GitHub > <#176>, or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABZRKXZ6S2QYJLL4PYFGU5TXP3CHLANCNFSM6AAAAAA2HWJASA> > . > You are receiving this because you are subscribed to this thread.Message > ID: ***@***.***> > -- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>

-- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>

brianjohnhaas · 2023-07-14T20:55:12Z

Hi Peter, This is what I'm seeing as far as counts of features: (base) wm4ca-d15:transdecoder_peter bhaas$ grep '>' concatenated_accepted_scaffolded_without_blast_sed.fasta | wc 11355 (base) wm4ca-d15:transdecoder_peter bhaas$ grep '>' test_nonstrandspec/concatenated_accepted_scaffolded_without_blast_sed.fasta.transdecoder.pep | wc 7522 Can you help me figure out which file is giving you the unexpected result? many thanks, ~brian

peterszoevenyi · 2023-07-17T06:50:47Z

Dear Brian, After going through the github page and your email I realized that the issue I mentioned does not exist. For whatever reason I just skipped the sentence on the github page saying, "the output files will be in the working directory". I realized that we were looking for the output files in the output directory of transdecoder (deifned with the --output flag) and not in the directory in which transdecoder.predict was executed. I now found the files in the right directory. Maybe you could make even clearer that the output files will be created in the directory in which the transdecoder.predict executable is executed and NOT in the output directory defined on the command line? For us this was not that obvious and led to some confusion concerning the whereabout of the files. Thanks a lot for helping us to clarify this Brina, We really appreciate your help! Cheers Peter

…

________________________________ From: Brian Haas ***@***.***> Sent: Friday, July 14, 2023 10:55 PM To: TransDecoder/TransDecoder ***@***.***> Cc: Peter Szövényi ***@***.***>; Author ***@***.***> Subject: Re: [TransDecoder/TransDecoder] TransDecoder.Predict flag --single_best_only produces .pep file with multiple ORFs per transcript id (Issue #176) Hi Peter, This is what I'm seeing as far as counts of features: (base) wm4ca-d15:transdecoder_peter bhaas$ grep '>' concatenated_accepted_scaffolded_without_blast_sed.fasta | wc 11355 (base) wm4ca-d15:transdecoder_peter bhaas$ grep '>' test_nonstrandspec/concatenated_accepted_scaffolded_without_blast_sed.fasta.transdecoder.pep | wc 7522 Can you help me figure out which file is giving you the unexpected result? many thanks, ~brian — Reply to this email directly, view it on GitHub<#176 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AMWSKEAHNMRJGQFTWEZTWRTXQGW3VANCNFSM6AAAAAA2HWJASA>. You are receiving this because you authored the thread.Message ID: ***@***.***>

brianjohnhaas · 2023-07-17T19:14:41Z

No problem. At least in the latest version, the outputs will be put into the output directory. best, ~b

…

On Mon, Jul 17, 2023 at 2:50 AM peterszoevenyi ***@***.***> wrote: Dear Brian, After going through the github page and your email I realized that the issue I mentioned does not exist. For whatever reason I just skipped the sentence on the github page saying, "the output files will be in the working directory". I realized that we were looking for the output files in the output directory of transdecoder (deifned with the --output flag) and not in the directory in which transdecoder.predict was executed. I now found the files in the right directory. Maybe you could make even clearer that the output files will be created in the directory in which the transdecoder.predict executable is executed and NOT in the output directory defined on the command line? For us this was not that obvious and led to some confusion concerning the whereabout of the files. Thanks a lot for helping us to clarify this Brina, We really appreciate your help! Cheers Peter ________________________________ From: Brian Haas ***@***.***> Sent: Friday, July 14, 2023 10:55 PM To: TransDecoder/TransDecoder ***@***.***> Cc: Peter Szövényi ***@***.***>; Author ***@***.***> Subject: Re: [TransDecoder/TransDecoder] TransDecoder.Predict flag --single_best_only produces .pep file with multiple ORFs per transcript id (Issue #176) Hi Peter, This is what I'm seeing as far as counts of features: (base) wm4ca-d15:transdecoder_peter bhaas$ grep '>' concatenated_accepted_scaffolded_without_blast_sed.fasta | wc 11355 (base) wm4ca-d15:transdecoder_peter bhaas$ grep '>' test_nonstrandspec/concatenated_accepted_scaffolded_without_blast_sed.fasta.transdecoder.pep | wc 7522 Can you help me figure out which file is giving you the unexpected result? many thanks, ~brian — Reply to this email directly, view it on GitHub< #176 (comment)>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AMWSKEAHNMRJGQFTWEZTWRTXQGW3VANCNFSM6AAAAAA2HWJASA>. You are receiving this because you authored the thread.Message ID: ***@***.***> — Reply to this email directly, view it on GitHub <#176 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZRKX5CF5H5XC4WGE4HSTDXQTOFHANCNFSM6AAAAAA2HWJASA> . You are receiving this because you commented.Message ID: ***@***.***>

-- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TransDecoder.Predict flag --single_best_only produces .pep file with multiple ORFs per transcript id #176

TransDecoder.Predict flag --single_best_only produces .pep file with multiple ORFs per transcript id #176

peterszoevenyi commented Jul 12, 2023

brianjohnhaas commented Jul 13, 2023 via email

brianjohnhaas commented Jul 14, 2023 via email

brianjohnhaas commented Jul 14, 2023 via email

peterszoevenyi commented Jul 17, 2023 via email

brianjohnhaas commented Jul 17, 2023 via email

TransDecoder.Predict flag --single_best_only produces .pep file with multiple ORFs per transcript id #176

TransDecoder.Predict flag --single_best_only produces .pep file with multiple ORFs per transcript id #176

Comments

peterszoevenyi commented Jul 12, 2023

brianjohnhaas commented Jul 13, 2023 via email

brianjohnhaas commented Jul 14, 2023 via email

brianjohnhaas commented Jul 14, 2023 via email

peterszoevenyi commented Jul 17, 2023 via email

brianjohnhaas commented Jul 17, 2023 via email