New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TransDecoder.Predict flag --single_best_only produces .pep file with multiple ORFs per transcript id #176
Comments
Hi Peter,
Can you tar.gz your working directory including the transdecoder
intermediates and inputs and privately share it with me? I'll take a look.
bhaas at broadinstitute dot org
best,
~brian
…On Wed, Jul 12, 2023 at 11:55 AM peterszoevenyi ***@***.***> wrote:
Dear Brian,
We are trying to create the best ORF prediction for each transcript in a
fasta file.
We run TransDecoder.LongOrfs with the --complete_orfs_only flag.
Then run TransDecoder.Predict with the --single_best_only flag.
Despite using the --single_best_only flag the longest_orfs.pep (25902
fasta header entries) file contains many more peptide predictions than the
transcript.fasta (11355 unique header entries) file used. This is because
the longest_orfs.pep file reprorts multiple peptide predictions per
transcript id named as "trasncriptid1.p1" and "transcriptid1.p2" etc.
Furthermore, if I grep only the headers containing the ".p1" string from
the longest_orfs.pep file, I have fewer entries (8029) than in the
transcript.fasta file (11355 unique header entries). I guess this means
that some proportion of the transcript.fasta transcripts produced partial
ORF predictions that were filtered out in the first step of the process,
right?
According to my understanding (and the help of the TransDecoder.Predict
code) the --single_best_only flag is expect to retain only the single best
ORF per transcript id. Nevertheless, the results suggest that this is not
the case.
Is this a bug or have we misinterpreted the description of the
--single_best_only flag?
We are using /TransDecoder-TransDecoder-v5.7.0/
I would appreciate if you could let me know your thoughts on this issue
which we have not been able to resolve.
With kind regards.
Peter
—
Reply to this email directly, view it on GitHub
<#176>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZRKXZ6S2QYJLL4PYFGU5TXP3CHLANCNFSM6AAAAAA2HWJASA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Thanks, Peter. I'll look through the files you sent and get back to you
shortly.
best,
~brian
…On Thu, Jul 13, 2023 at 9:15 AM Brian Haas ***@***.***> wrote:
Hi Peter,
Can you tar.gz your working directory including the transdecoder
intermediates and inputs and privately share it with me? I'll take a look.
bhaas at broadinstitute dot org
best,
~brian
On Wed, Jul 12, 2023 at 11:55 AM peterszoevenyi ***@***.***>
wrote:
> Dear Brian,
>
> We are trying to create the best ORF prediction for each transcript in a
> fasta file.
>
> We run TransDecoder.LongOrfs with the --complete_orfs_only flag.
> Then run TransDecoder.Predict with the --single_best_only flag.
>
> Despite using the --single_best_only flag the longest_orfs.pep (25902
> fasta header entries) file contains many more peptide predictions than the
> transcript.fasta (11355 unique header entries) file used. This is because
> the longest_orfs.pep file reprorts multiple peptide predictions per
> transcript id named as "trasncriptid1.p1" and "transcriptid1.p2" etc.
> Furthermore, if I grep only the headers containing the ".p1" string from
> the longest_orfs.pep file, I have fewer entries (8029) than in the
> transcript.fasta file (11355 unique header entries). I guess this means
> that some proportion of the transcript.fasta transcripts produced partial
> ORF predictions that were filtered out in the first step of the process,
> right?
>
> According to my understanding (and the help of the TransDecoder.Predict
> code) the --single_best_only flag is expect to retain only the single best
> ORF per transcript id. Nevertheless, the results suggest that this is not
> the case.
>
> Is this a bug or have we misinterpreted the description of the
> --single_best_only flag?
>
> We are using /TransDecoder-TransDecoder-v5.7.0/
>
> I would appreciate if you could let me know your thoughts on this issue
> which we have not been able to resolve.
> With kind regards.
> Peter
>
> —
> Reply to this email directly, view it on GitHub
> <#176>, or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABZRKXZ6S2QYJLL4PYFGU5TXP3CHLANCNFSM6AAAAAA2HWJASA>
> .
> You are receiving this because you are subscribed to this thread.Message
> ID: ***@***.***>
>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Hi Peter,
This is what I'm seeing as far as counts of features:
(base) wm4ca-d15:transdecoder_peter bhaas$ grep '>'
concatenated_accepted_scaffolded_without_blast_sed.fasta | wc
11355
(base) wm4ca-d15:transdecoder_peter bhaas$ grep '>'
test_nonstrandspec/concatenated_accepted_scaffolded_without_blast_sed.fasta.transdecoder.pep
| wc
7522
Can you help me figure out which file is giving you the unexpected result?
many thanks,
~brian
|
Dear Brian,
After going through the github page and your email I realized that the issue I mentioned does not exist.
For whatever reason I just skipped the sentence on the github page saying, "the output files will be in the working directory". I realized that we were looking for the output files in the output directory of transdecoder (deifned with the --output flag) and not in the directory in which transdecoder.predict was executed. I now found the files in the right directory.
Maybe you could make even clearer that the output files will be created in the directory in which the transdecoder.predict executable is executed and NOT in the output directory defined on the command line? For us this was not that obvious and led to some confusion concerning the whereabout of the files.
Thanks a lot for helping us to clarify this Brina,
We really appreciate your help!
Cheers Peter
…________________________________
From: Brian Haas ***@***.***>
Sent: Friday, July 14, 2023 10:55 PM
To: TransDecoder/TransDecoder ***@***.***>
Cc: Peter Szövényi ***@***.***>; Author ***@***.***>
Subject: Re: [TransDecoder/TransDecoder] TransDecoder.Predict flag --single_best_only produces .pep file with multiple ORFs per transcript id (Issue #176)
Hi Peter,
This is what I'm seeing as far as counts of features:
(base) wm4ca-d15:transdecoder_peter bhaas$ grep '>'
concatenated_accepted_scaffolded_without_blast_sed.fasta | wc
11355
(base) wm4ca-d15:transdecoder_peter bhaas$ grep '>'
test_nonstrandspec/concatenated_accepted_scaffolded_without_blast_sed.fasta.transdecoder.pep
| wc
7522
Can you help me figure out which file is giving you the unexpected result?
many thanks,
~brian
—
Reply to this email directly, view it on GitHub<#176 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AMWSKEAHNMRJGQFTWEZTWRTXQGW3VANCNFSM6AAAAAA2HWJASA>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
No problem. At least in the latest version, the outputs will be put into
the output directory.
best,
~b
…On Mon, Jul 17, 2023 at 2:50 AM peterszoevenyi ***@***.***> wrote:
Dear Brian,
After going through the github page and your email I realized that the
issue I mentioned does not exist.
For whatever reason I just skipped the sentence on the github page saying,
"the output files will be in the working directory". I realized that we
were looking for the output files in the output directory of transdecoder
(deifned with the --output flag) and not in the directory in which
transdecoder.predict was executed. I now found the files in the right
directory.
Maybe you could make even clearer that the output files will be created in
the directory in which the transdecoder.predict executable is executed and
NOT in the output directory defined on the command line? For us this was
not that obvious and led to some confusion concerning the whereabout of the
files.
Thanks a lot for helping us to clarify this Brina,
We really appreciate your help!
Cheers Peter
________________________________
From: Brian Haas ***@***.***>
Sent: Friday, July 14, 2023 10:55 PM
To: TransDecoder/TransDecoder ***@***.***>
Cc: Peter Szövényi ***@***.***>; Author ***@***.***>
Subject: Re: [TransDecoder/TransDecoder] TransDecoder.Predict flag
--single_best_only produces .pep file with multiple ORFs per transcript id
(Issue #176)
Hi Peter,
This is what I'm seeing as far as counts of features:
(base) wm4ca-d15:transdecoder_peter bhaas$ grep '>'
concatenated_accepted_scaffolded_without_blast_sed.fasta | wc
11355
(base) wm4ca-d15:transdecoder_peter bhaas$ grep '>'
test_nonstrandspec/concatenated_accepted_scaffolded_without_blast_sed.fasta.transdecoder.pep
| wc
7522
Can you help me figure out which file is giving you the unexpected result?
many thanks,
~brian
—
Reply to this email directly, view it on GitHub<
#176 (comment)>,
or unsubscribe<
https://github.com/notifications/unsubscribe-auth/AMWSKEAHNMRJGQFTWEZTWRTXQGW3VANCNFSM6AAAAAA2HWJASA>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
—
Reply to this email directly, view it on GitHub
<#176 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZRKX5CF5H5XC4WGE4HSTDXQTOFHANCNFSM6AAAAAA2HWJASA>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Dear Brian,
We are trying to create the best ORF prediction for each transcript in a fasta file.
We run TransDecoder.LongOrfs with the --complete_orfs_only flag.
Then run TransDecoder.Predict with the --single_best_only flag.
Despite using the --single_best_only flag the longest_orfs.pep (25902 fasta header entries) file contains many more peptide predictions than the transcript.fasta (11355 unique header entries) file used. This is because the longest_orfs.pep file reprorts multiple peptide predictions per transcript id named as "trasncriptid1.p1" and "transcriptid1.p2" etc.
Furthermore, if I grep only the headers containing the ".p1" string from the longest_orfs.pep file, I have fewer entries (8029) than in the transcript.fasta file (11355 unique header entries). I guess this means that some proportion of the transcript.fasta transcripts produced partial ORF predictions that were filtered out in the first step of the process, right?
According to my understanding (and the help of the TransDecoder.Predict code) the --single_best_only flag is expect to retain only the single best ORF per transcript id. Nevertheless, the results suggest that this is not the case.
Is this a bug or have we misinterpreted the description of the --single_best_only flag?
We are using /TransDecoder-TransDecoder-v5.7.0/
I would appreciate if you could let me know your thoughts on this issue which we have not been able to resolve.
With kind regards.
Peter
The text was updated successfully, but these errors were encountered: