Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences in gene prediction for forward and reverse sequences #90

Open
caballero opened this issue Jun 13, 2022 · 0 comments
Open

Differences in gene prediction for forward and reverse sequences #90

caballero opened this issue Jun 13, 2022 · 0 comments

Comments

@caballero
Copy link

I am testing some gene prediction on metagenomic contigs, as the contigs are assembled, I don't have a sense of sequence direction (it could be the forward or the reverse sequence). I was testing what happens if I provide the sequence in either direction, in general, I see complete agreement on predictions, however, in 20-25% of my tests, I see differences if I provide the forward or reverse sequence.

For example:

$ echo -e ">test\nGGTCCGCCCGGCCCTCCGGGCCCTCCGGGCCCGCCGGGCCCGCCGACGCCGCCCACGCCGCCGTTGCTGCCCTCGCAGCCGGAGTTGCACCCGGTGCCGCCGGTGCCGCCGGTCTGGCCCTGGCCGCCCGCGCCGCCGGTGCCACCCTTGCCGCCAGCGCCCGAATAGCTGAGCAGACGAAGGGGTTCACGTTCGACGGGTCGAGACGCCGAGATTGACGATGTTACGAAGGTCATCTGGGTCAGGCCGGAATTGCCCGTTCCCGCCCGGGCGCCGGTCGTGGCGGCGGCGCCCGGGCCCCGGTTGGCCGCGATCACCGTGGAAGGTGAAGGTGCCCACCGGGAATCGCCGACCGGCCGTGCTCGTGCGCGCCCGTGACGCTCATGCGGGGCGCACCGGCGTACCGGGTCGTCCCCTGGGCCCCGGGTTGCCGACGATAGCCGAGATGGTAGATGTGCGAACGACGGATCTGGGGTTGGGCGTGATCAACCAGGTTTGTCGCCGTGATCGCAGCTGAGGTGGACATTGATCGTCAGAGACACCGCCGTCGAACACGAAGCGTGCCCGCGTTGATCGCATAGGACTGGGCGTTGGGGCCGATGAACTGATCACGTTGTCTGATGGGTCACGGTGGGCCGGAGGTGACAAGCCCAACCGTCACCAGGGGAACACCTGGTCCGCCAGCTGATGCAACCGCGTCCGAGAGCGCTCCTACGCCGAAAGACGTGATCCATGCAGGAGAAATGCATGTTCGGGATGGACCGTGTCGGCGCGAGGGCCGGTTCTCATGTAGGCGCGGCGCTGGTCGTCCGGTCGCCTTGGCCCGCGAAAGAGCTTAACGATACTGGCTGATCGAACATCGACGTGACTTCGCTGGTGATCGTCTCGCCCCGCCGACCAGCGGCGTCGGTGTCCTGACCCATAGCCTGGGCCGAGCCAGATAGTTGCATAACCGCGTCTCCCCTGTGAAGCTTATTGCCTGATCGTTCAGTGATTCGCGTGAGCATTGCAAGCGAATCTCAATTCGTCGTGGACGGTCAGGGGCGGCGCTGTCCGTAGCGGTTGGAACGCGCTCGAAACGAGTATCGGCCGGGTTCGCAATCGTGGGCGGGGGCCAGCGCTCCGCGCGCAAGGGAGGGCTCGGGGAGGGGGGGCGAAGCGGGGAGAGGGGCTGCGCGATAGCGCCTGTTCGGCAGAAGACAGACCGAAACACTAGCGAACGCGCCGCTCTCGCCGAAAAGATTGCACGCAACGTCGATGGAGGCAATCGCGATAGCAGCAAACATAAGAAGCTCGAAGGCATCGCGGATTTGAAGGCGTCGCGCGCGAAGGCTGCTGGATGCACGTGCGCGCCTGCGCGAGCTGGCTCTGCGCCCATCACTATGGTGCCCGACGTGCGGTCCCCGACTATCTCCGCCGCCGCGGCGCATCGCCAATCGCTGGCCGTCGCTCGCCAGTGCGCAGTCGGACGAGCGATCAGCGCGTTCATTCGGAACGCCGTGTCCGCCAGGGCCGAGGGAAATGGAACTCGCGTGCAGCGCGGCGATATCTGGACCGTTTCGGGCGGCAGGGACTACGCGGGCAAGCCGCGTCCCGTCGTCATCGTCCAGGATGATAGTTTCGACATGACGTACTCCGTCACCATCTGCGCCTTCACCACCGACACGACCGACGCGCCGCTGTTTCGCCT" | ./prodigal -p meta 
-------------------------------------
PRODIGAL v2.6.3 [February, 2016]         
Univ of Tenn / Oak Ridge National Lab
Doug Hyatt, Loren Hauser, et al.     
-------------------------------------
Request:  Metagenomic, Phase:  Training
Initializing training files...done!
-------------------------------------
Request:  Metagenomic, Phase:  Gene Finding
Finding genes in sequence #1 (1690 bp)...done!
DEFINITION  seqnum=1;seqlen=1690;seqhdr="test";version=Prodigal.v2.6.3;run_type=Metagenomic;model="31|Natrialba_magadii_ATCC_43099|A|61.4|11|0";gc_cont=61.40;transl_table=11;uses_sd=0
FEATURES             Location/Qualifiers
     CDS             complement(<3..317)
                     /note="ID=1_1;partial=10;start_type=GTG;rbs_motif=None;rbs_spacer=None;gc_cont=0.765;conf=98.18;score=17.34;cscore=21.22;sscore=-3.88;rscore=0.00;uscore=-3.46;tscore=-5.18;"
     CDS             complement(836..1489)
                     /note="ID=1_2;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.639;conf=90.25;score=9.68;cscore=6.45;sscore=3.23;rscore=0.00;uscore=-1.00;tscore=4.88;"
     CDS             1533..>1688
                     /note="ID=1_3;partial=01;start_type=GTG;rbs_motif=None;rbs_spacer=None;gc_cont=0.641;conf=83.11;score=6.93;cscore=16.68;sscore=-9.75;rscore=0.00;uscore=-4.57;tscore=-5.18;"
//

But if I provide the reverse complement sequence, the predictions are missing 1 sequence and one sequence is shorter:

$ echo -e ">test\nGGTCCGCCCGGCCCTCCGGGCCCTCCGGGCCCGCCGGGCCCGCCGACGCCGCCCACGCCGCCGTTGCTGCCCTCGCAGCCGGAGTTGCACCCGGTGCCGCCGGTGCCGCCGGTCTGGCCCTGGCCGCCCGCGCCGCCGGTGCCACCCTTGCCGCCAGCGCCCGAATAGCTGAGCAGACGAAGGGGTTCACGTTCGACGGGTCGAGACGCCGAGATTGACGATGTTACGAAGGTCATCTGGGTCAGGCCGGAATTGCCCGTTCCCGCCCGGGCGCCGGTCGTGGCGGCGGCGCCCGGGCCCCGGTTGGCCGCGATCACCGTGGAAGGTGAAGGTGCCCACCGGGAATCGCCGACCGGCCGTGCTCGTGCGCGCCCGTGACGCTCATGCGGGGCGCACCGGCGTACCGGGTCGTCCCCTGGGCCCCGGGTTGCCGACGATAGCCGAGATGGTAGATGTGCGAACGACGGATCTGGGGTTGGGCGTGATCAACCAGGTTTGTCGCCGTGATCGCAGCTGAGGTGGACATTGATCGTCAGAGACACCGCCGTCGAACACGAAGCGTGCCCGCGTTGATCGCATAGGACTGGGCGTTGGGGCCGATGAACTGATCACGTTGTCTGATGGGTCACGGTGGGCCGGAGGTGACAAGCCCAACCGTCACCAGGGGAACACCTGGTCCGCCAGCTGATGCAACCGCGTCCGAGAGCGCTCCTACGCCGAAAGACGTGATCCATGCAGGAGAAATGCATGTTCGGGATGGACCGTGTCGGCGCGAGGGCCGGTTCTCATGTAGGCGCGGCGCTGGTCGTCCGGTCGCCTTGGCCCGCGAAAGAGCTTAACGATACTGGCTGATCGAACATCGACGTGACTTCGCTGGTGATCGTCTCGCCCCGCCGACCAGCGGCGTCGGTGTCCTGACCCATAGCCTGGGCCGAGCCAGATAGTTGCATAACCGCGTCTCCCCTGTGAAGCTTATTGCCTGATCGTTCAGTGATTCGCGTGAGCATTGCAAGCGAATCTCAATTCGTCGTGGACGGTCAGGGGCGGCGCTGTCCGTAGCGGTTGGAACGCGCTCGAAACGAGTATCGGCCGGGTTCGCAATCGTGGGCGGGGGCCAGCGCTCCGCGCGCAAGGGAGGGCTCGGGGAGGGGGGGCGAAGCGGGGAGAGGGGCTGCGCGATAGCGCCTGTTCGGCAGAAGACAGACCGAAACACTAGCGAACGCGCCGCTCTCGCCGAAAAGATTGCACGCAACGTCGATGGAGGCAATCGCGATAGCAGCAAACATAAGAAGCTCGAAGGCATCGCGGATTTGAAGGCGTCGCGCGCGAAGGCTGCTGGATGCACGTGCGCGCCTGCGCGAGCTGGCTCTGCGCCCATCACTATGGTGCCCGACGTGCGGTCCCCGACTATCTCCGCCGCCGCGGCGCATCGCCAATCGCTGGCCGTCGCTCGCCAGTGCGCAGTCGGACGAGCGATCAGCGCGTTCATTCGGAACGCCGTGTCCGCCAGGGCCGAGGGAAATGGAACTCGCGTGCAGCGCGGCGATATCTGGACCGTTTCGGGCGGCAGGGACTACGCGGGCAAGCCGCGTCCCGTCGTCATCGTCCAGGATGATAGTTTCGACATGACGTACTCCGTCACCATCTGCGCCTTCACCACCGACACGACCGACGCGCCGCTGTTTCGCCT\n" | perl -pe 'unless (/>/) { $_= reverse $_; tr/ACGT/TGCA/}' | ./prodigal -p meta 
-------------------------------------
PRODIGAL v2.6.3 [February, 2016]         
Univ of Tenn / Oak Ridge National Lab
Doug Hyatt, Loren Hauser, et al.     
-------------------------------------
Request:  Metagenomic, Phase:  Training
Initializing training files...done!
-------------------------------------
Request:  Metagenomic, Phase:  Gene Finding
Finding genes in sequence #1 (1690 bp)...done!
DEFINITION  seqnum=1;seqlen=1690;seqhdr="test";version=Prodigal.v2.6.3;run_type=Metagenomic;model="31|Natrialba_magadii_ATCC_43099|A|61.4|11|0";gc_cont=61.40;transl_table=11;uses_sd=0
FEATURES             Location/Qualifiers
     CDS             complement(<3..308)
                     /note="ID=1_1;partial=10;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.670;conf=96.25;score=14.12;cscore=13.17;sscore=0.95;rscore=0.00;uscore=-3.28;tscore=4.88;"
     CDS             1374..>1688
                     /note="ID=1_2;partial=01;start_type=GTG;rbs_motif=None;rbs_spacer=None;gc_cont=0.765;conf=98.18;score=17.34;cscore=21.22;sscore=-3.88;rscore=0.00;uscore=-3.46;tscore=-5.18;"
//
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant