Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid skipping candidate RBS positions in rbs_score #102

Open
wants to merge 1 commit into
base: GoogleImport
Choose a base branch
from

Conversation

althonos
Copy link
Contributor

@althonos althonos commented Jan 29, 2023

Hi, one final PR 馃槂

In the test sequence I used for #100 I noticed the following bug: after reverse-complementing a sequence, the RBS spacer for one of the predicted gene was changing when the contig was reverse-complemented:

Forward:
CAKWEX010000332.1	Prodigal_v2.6.3	CDS	520	2178	298.9	-	0	ID=1_1;partial=00;start_type=ATG;rbs_motif=AGGA;rbs_spacer=5-10bp;gc_cont=0.636;conf=99.99;score=298.87;cscore=285.76;sscore=13.11;rscore=7.85;uscore=1.61;tscore=3.65;
CAKWEX010000332.1	Prodigal_v2.6.3	CDS	2250	2852	105.5	-	0	ID=1_2;partial=00;start_type=GTG;rbs_motif=AGGAG;rbs_spacer=5-10bp;gc_cont=0.580;conf=100.00;score=105.47;cscore=95.43;sscore=10.04;rscore=13.75;uscore=1.16;tscore=-4.87;
CAKWEX010000332.1	Prodigal_v2.6.3	CDS	2936	3754	140.5	-	0	ID=1_3;partial=00;start_type=ATG;rbs_motif=GGA/GAG/AGG;rbs_spacer=5-10bp;gc_cont=0.601;conf=100.00;score=140.50;cscore=133.74;sscore=6.75;rscore=2.03;uscore=1.08;tscore=3.65;
Reverse-complemented:
CAKWEX010000332.1_r	Prodigal_v2.6.3	CDS	12	830	136.1	+	0	ID=1_1;partial=00;start_type=ATG;rbs_motif=GGA/GAG/AGG;rbs_spacer=3-4bp;gc_cont=0.601;conf=100.00;score=136.12;cscore=133.74;sscore=2.37;rscore=-2.35;uscore=1.08;tscore=3.65;
CAKWEX010000332.1_r	Prodigal_v2.6.3	CDS	914	1516	105.5	+	0	ID=1_2;partial=00;start_type=GTG;rbs_motif=AGGAG;rbs_spacer=5-10bp;gc_cont=0.580;conf=100.00;score=105.47;cscore=95.43;sscore=10.04;rscore=13.75;uscore=1.16;tscore=-4.87;
CAKWEX010000332.1_r	Prodigal_v2.6.3	CDS	1588	3246	298.9	+	0	ID=1_3;partial=00;start_type=ATG;rbs_motif=AGGA;rbs_spacer=5-10bp;gc_cont=0.636;conf=99.99;score=298.87;cscore=285.76;sscore=13.11;rscore=7.85;uscore=1.61;tscore=3.65;

Indeed, the gene with the GGA/GAG/AGG RBS motif has a spacer detected as 3-4bp when on the forward strand, and 5-10bp on the reverse strand. The contig in question starts with the following sequence:

GGATAGGCCCCATG...

so it has both a match in the 3-4bp range (AGG) and in the 5-10bp range (GGA), but since the 5-10bp spacer has a higher score it should be the one to be selected. This actually matters on the gene score, so it could cause some predictions to change.

The problem was coming from the loops in rbs_score which skip some positions before index 0; however, when there may be a partial match (as it is the case here, with a GGA motif right on the contig edge), the positions should not be skipped, and the decision to ignore some positions should be made by the shine_dalgarno_exact and shine_dalgarno_mm functions directly.

After applying the patch, the predictions are consistent independent of the directionality of the contig, the RBS spacers and hence the gene scores match:

Forward:
CAKWEX010000332.1	Prodigal_v2.6.3	CDS	520	2178	298.9	-	0	ID=1_1;partial=00;start_type=ATG;rbs_motif=AGGA;rbs_spacer=5-10bp;gc_cont=0.636;conf=99.99;score=298.87;cscore=285.76;sscore=13.11;rscore=7.85;uscore=1.61;tscore=3.65;
CAKWEX010000332.1	Prodigal_v2.6.3	CDS	2250	2852	105.5	-	0	ID=1_2;partial=00;start_type=GTG;rbs_motif=AGGAG;rbs_spacer=5-10bp;gc_cont=0.582;conf=100.00;score=105.47;cscore=95.43;sscore=10.04;rscore=13.75;uscore=1.16;tscore=-4.87;
CAKWEX010000332.1	Prodigal_v2.6.3	CDS	2936	3754	140.5	-	0	ID=1_3;partial=00;start_type=ATG;rbs_motif=GGA/GAG/AGG;rbs_spacer=5-10bp;gc_cont=0.602;conf=100.00;score=140.50;cscore=133.74;sscore=6.75;rscore=2.03;uscore=1.08;tscore=3.65;
Reverse-complemented:
CAKWEX010000332.1_r	Prodigal_v2.6.3	CDS	12	830	140.5	+	0	ID=1_1;partial=00;start_type=ATG;rbs_motif=GGA/GAG/AGG;rbs_spacer=5-10bp;gc_cont=0.601;conf=100.00;score=140.50;cscore=133.74;sscore=6.75;rscore=2.03;uscore=1.08;tscore=3.65;
CAKWEX010000332.1_r	Prodigal_v2.6.3	CDS	914	1516	105.5	+	0	ID=1_2;partial=00;start_type=GTG;rbs_motif=AGGAG;rbs_spacer=5-10bp;gc_cont=0.580;conf=100.00;score=105.47;cscore=95.43;sscore=10.04;rscore=13.75;uscore=1.16;tscore=-4.87;
CAKWEX010000332.1_r	Prodigal_v2.6.3	CDS	1588	3246	298.9	+	0	ID=1_3;partial=00;start_type=ATG;rbs_motif=AGGA;rbs_spacer=5-10bp;gc_cont=0.636;conf=99.99;score=298.87;cscore=285.76;sscore=13.11;rscore=7.85;uscore=1.61;tscore=3.65;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant