Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ivar consensus: option -q is not filtering low quality bases #97

Open
pmenzel opened this issue Apr 23, 2021 · 1 comment
Open

ivar consensus: option -q is not filtering low quality bases #97

pmenzel opened this issue Apr 23, 2021 · 1 comment
Assignees

Comments

@pmenzel
Copy link

pmenzel commented Apr 23, 2021

Consider an example output from samtools mpileup for the SARS-CoV2 mutation N501Y:

MN908947.3      23063   A       20      tttttttttttttttttttt    EEEEEAAAEEEAEEEEEE/A

i.e. there are 20 reads covering this position with a T, and one of the bases only has a quality score of 14 (encoded by /).

When setting a minimum coverage threshold of 20 (option -m 20) and minimum q-score of 20 (option -q 20) in ivar consensus, the program outputs a T in the consensus sequence, while it should be a N, since only 19 of the bases are at least quality 20.

echo "MN908947.3\t23063\tN\t20\ttttttttttttttttttttt\tEEEEEAAAEEEAEEEEEE/A" | ivar consensus -t 0.9 -m 20 -q 20 -n N -p test.fa && cat test.fa

Minimum Quality: 20
Threshold: 0.9
Minimum depth: 20
Regions with depth less than minimum depth covered by: N
Reference length: 1
Positions with 0 depth: 0
Positions with depth below 20: 0
>Consensus_test_threshold_0.9_quality_20
T

On the other hand, ivar variants properly filters the lower quality base and does not call the variant unless reducing the value for option -q.

ivar variants -q 20:

echo "MN908947.3\t23063\tN\t20\ttttttttttttttttttttt\tEEEEEAAAEEEAEEEEEE/A" | \
ivar variants -r nCoV-2019.reference.fasta -m 20 -p variants -q 20 -t 0.25 && column -t -s$'\t' variants.tsv

REGION  POS  REF  ALT  REF_DP  REF_RV  REF_QUAL  ALT_DP  ALT_RV  ALT_QUAL  ALT_FREQ  TOTAL_DP  PVAL  PASS  GFF_FEATURE  REF_CODON  REF_AA  ALT_CODON  ALT_AA

ivar variants -q 14:

echo "MN908947.3\t23063\tN\t20\ttttttttttttttttttttt\tEEEEEAAAEEEAEEEEEE/A" | \
ivar variants -r nCoV-2019.reference.fasta -m 20 -p variants -q 14 -t 0.25 && column -t -s$'\t' variants.tsv

REGION      POS    REF  ALT  REF_DP  REF_RV  REF_QUAL  ALT_DP  ALT_RV  ALT_QUAL  ALT_FREQ  TOTAL_DP  PVAL         PASS  GFF_FEATURE  REF_CODON  REF_AA  ALT_CODON  ALT_AA
MN908947.3  23063  A    T    0       0       0         20      20      33        1         20        1.45089e-11  TRUE  NA           NA         NA      NA         NA
@gkarthik
Copy link
Member

gkarthik commented Jun 8, 2021

Hello @pmenzel , thank you for pointing this out! The way it's setup right now, the variants take the quality into account while setting the threshold for depth but consensus does not take the quality into count when looking at minimum depth. I agree this is confusing and I will work on changing this behavior.

@gkarthik gkarthik self-assigned this Jun 8, 2021
pmenzel added a commit to LaborBerlin/LB-ncov2019-artic-nf that referenced this issue Nov 5, 2021
disable overlap detection in samtools mpileup | ivar variants

this makes the variant calls more compatible to ivar consensus,
which does not care about the base qualities when reading the
samtools mpileup output

see also andersen-lab/ivar#97
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants