Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variant not detected near end of reference sequence but detected by Bcftools mpileup/call and visually confirmed #257

Open
peterk87 opened this issue Dec 19, 2023 · 11 comments
Labels
enhancement New feature or request

Comments

@peterk87
Copy link

Hello,

I noticed that Clair3 is not calling a variant near the end of a short reference sequence (position 2265 of Influenza A virus PB2 sequence OP597571.1; sequence length=2280).

Bcftools calls at a G->A variant at position 2265. It's also clear from looking at the read alignment in IGV that there's a high AF variant at the position.

Clair3 merge_output.vcf.gz:

##fileformat=VCFv4.2
##source=Clair3
##clair3_version=1.0.4
##cmdline=../run_clair3.sh --bam_fn=minimap2.bam --ref_fn=OP597571.1.fasta --model_path=.../models/r941_prom_sup_g5014 --print_ref_calls --gvcf --threads=1 --platform=ont --output=clair3-out --haploid_sensitive --enable_long_indel --keep_iupac_bases --no_phasing_for_fa --include_all_ctgs
##reference=OP597571.1.fasta
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=LowQual,Description="Low quality variant">
##FILTER=<ID=RefCall,Description="Reference call">
##INFO=<ID=P,Number=0,Type=Flag,Description="Result from pileup calling">
##INFO=<ID=F,Number=0,Type=Flag,Description="Result from full-alignment calling">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ<20 or selected by 'samtools view -F 2316' are filtered)">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=AF,Number=1,Type=Float,Description="Observed allele frequency in reads, for each ALT allele, in the same order as listed, or the REF allele for a RefCall">
##contig=<ID=OP597571.1,length=2280>
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
OP597571.1      62      .       C       .       17.62   RefCall P       GT:GQ:DP:AD:AF:PL       0:17:828:641:0.7742:990
OP597571.1      185     .       G       A       9.65    PASS    F       GT:GQ:DP:AD:AF:PL       1:9:143:2,107:0.7483:18,13,0
OP597571.1      232     .       T       .       21.36   RefCall P       GT:GQ:DP:AD:AF:PL       0:21:102:75:0.7353:990
OP597571.1      252     .       C       A       25.12   PASS    P       GT:GQ:DP:AD:AF:PL       1:25:90:1,84:0.9333:80,58,0
OP597571.1      495     .       G       .       19.81   RefCall P       GT:GQ:DP:AD:AF:PL       0:19:42:36:0.8571:990
OP597571.1      510     .       C       T       33.42   PASS    F       GT:GQ:DP:AD:AF:PL       1:33:42:0,41:0.9762:79,57,0
OP597571.1      528     .       A       C       35.38   PASS    F       GT:GQ:DP:AD:AF:PL       1:35:44:0,41:0.9318:80,63,0
OP597571.1      540     .       G       C       26.34   PASS    P       GT:GQ:DP:AD:AF:PL       1:26:45:0,44:0.9778:80,54,0
OP597571.1      639     .       C       .       22.44   RefCall P       GT:GQ:DP:AD:AF:PL       0:22:44:28:0.6364:990
OP597571.1      694     .       T       .       26.30   RefCall P       GT:GQ:DP:AD:AF:PL       0:26:43:37:0.8605:990
OP597571.1      708     .       A       .       23.57   RefCall P       GT:GQ:DP:AD:AF:PL       0:23:43:33:0.7674:990
OP597571.1      741     .       A       .       23.27   RefCall P       GT:GQ:DP:AD:AF:PL       0:23:49:30:0.6122:990
OP597571.1      747     .       A       G       24.68   PASS    F       GT:GQ:DP:AD:AF:PL       1:24:47:4,40:0.8511:74,37,0
OP597571.1      888     .       T       C       24.92   PASS    P       GT:GQ:DP:AD:AF:PL       1:24:58:1,56:0.9655:80,42,0
OP597571.1      895     .       A       .       23.47   RefCall P       GT:GQ:DP:AD:AF:PL       0:23:59:51:0.8644:990
OP597571.1      1113    .       A       .       21.30   RefCall P       GT:GQ:DP:AD:AF:PL       0:21:45:40:0.8889:990
OP597571.1      1170    .       C       T       17.59   PASS    F       GT:GQ:DP:AD:AF:PL       1:17:41:25,12:0.2927:22,0,63
OP597571.1      1335    .       G       A       28.63   PASS    F       GT:GQ:DP:AD:AF:PL       1:28:43:0,40:0.9302:77,45,0
OP597571.1      1345    .       T       .       25.67   RefCall P       GT:GQ:DP:AD:AF:PL       0:25:44:30:0.6818:990
OP597571.1      1432    .       G       .       24.87   RefCall P       GT:GQ:DP:AD:AF:PL       0:24:52:43:0.8269:990
OP597571.1      1469    .       A       G       24.83   PASS    P       GT:GQ:DP:AD:AF:PL       1:24:52:2,49:0.9423:80,37,0
OP597571.1      1504    .       T       .       22.74   RefCall P       GT:GQ:DP:AD:AF:PL       0:22:50:41:0.8200:990
OP597571.1      1522    .       A       .       26.06   RefCall P       GT:GQ:DP:AD:AF:PL       0:26:54:39:0.7222:990
OP597571.1      1566    .       G       A       32.47   PASS    F       GT:GQ:DP:AD:AF:PL       1:32:54:2,52:0.9630:80,53,0
OP597571.1      1677    .       T       C       31.74   PASS    F       GT:GQ:DP:AD:AF:PL       1:31:57:4,49:0.8596:80,52,0
OP597571.1      1736    .       C       .       23.60   RefCall P       GT:GQ:DP:AD:AF:PL       0:23:59:41:0.6949:990
OP597571.1      1737    .       T       .       25.37   RefCall P       GT:GQ:DP:AD:AF:PL       0:25:59:43:0.7288:990
OP597571.1      1760    .       C       .       21.45   RefCall P       GT:GQ:DP:AD:AF:PL       0:21:60:52:0.8667:990
OP597571.1      1761    .       T       .       18.29   RefCall P       GT:GQ:DP:AD:AF:PL       0:18:60:44:0.7333:990
OP597571.1      1806    .       G       A       27.10   PASS    P       GT:GQ:DP:AD:AF:PL       1:27:58:0,54:0.9310:80,47,0
OP597571.1      1942    .       C       .       21.32   RefCall P       GT:GQ:DP:AD:AF:PL       0:21:87:73:0.8391:990
OP597571.1      1957    .       T       .       24.29   RefCall P       GT:GQ:DP:AD:AF:PL       0:24:96:72:0.7500:990
OP597571.1      2051    .       T       C       26.04   PASS    P       GT:GQ:DP:AD:AF:PL       1:26:168:1,156:0.9286:80,41,0
OP597571.1      2154    .       A       .       32.13   RefCall F       GT:GQ:DP:AD:AF:PL       0:32:713:468:0.6564:990
OP597571.1      2216    .       G       .       21.01   RefCall P       GT:GQ:DP:AD:AF:PL       0:21:834:715:0.8573:990
OP597571.1      2253    .       C       .       18.05   RefCall P       GT:GQ:DP:AD:AF:PL       0:18:767:569:0.7419:990

Bcftools mpileup and call output:

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##bcftoolsVersion=1.17+htslib-1.17
##bcftoolsCommand=bcftools mpileup -d 100000 -f OP597571.1.fasta minimap2.bam
##contig=<ID=OP597571.1,length=2280>
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of raw reads supporting an indel">
##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of raw reads supporting an indel">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">
##INFO=<ID=RPBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Read Position Bias (closer to 0 is better)">
##INFO=<ID=MQBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Mapping Quality Bias (closer to 0 is better)">
##INFO=<ID=BQBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Base Quality Bias (closer to 0 is better)">
##INFO=<ID=MQSBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Mapping Quality vs Strand Bias (closer to 0 is better)">
##INFO=<ID=SCBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Soft-Clip Length Bias (closer to 0 is better)">
##INFO=<ID=SGB,Number=1,Type=Float,Description="Segregation based metric, http://samtools.github.io/bcftools/rd-SegBias.pdf">
##INFO=<ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality">
##bcftools_callVersion=1.17+htslib-1.17
##bcftools_callCommand=call --ploidy 1 -mv -Ob -o calls.vcf; Date=Mon Dec 18 20:32:30 2023
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  WIN-AH-2023-FAV-0375-3-1C2d.Segment_1_PB2.OP597571.1.bam
OP597571.1      185     .       G       A       228.391 .       DP=139;VDB=0.979122;SGB=-0.693147;RPBZ=-0.747496;MQBZ=-0.574548;MQSBZ=-0.192448;BQBZ=1.13137;SCBZ=-0.508121;MQ0F=0;AC=1;AN=1;DP4=1,1,29,40;MQ=56       GT:PL   1:255,0
OP597571.1      252     .       C       A       225.421 .       DP=102;VDB=0.747331;SGB=-0.693147;MQSBZ=-0.642482;MQ0F=0;AC=1;AN=1;DP4=0,0,27,46;MQ=55  GT:PL 1:255,0
OP597571.1      510     .       C       T       228.402 .       DP=42;VDB=0.0719096;SGB=-0.693145;RPBZ=-0.412677;MQBZ=0;MQSBZ=0;BQBZ=1.36377;SCBZ=0.323986;MQ0F=0;AC=1;AN=1;DP4=1,0,16,25;MQ=60        GT:PL   1:255,0
OP597571.1      528     .       A       C       211.417 .       DP=41;VDB=0.278679;SGB=-0.693145;MQSBZ=0;MQ0F=0;AC=1;AN=1;DP4=0,0,19,22;MQ=60   GT:PL   1:241,0
OP597571.1      540     .       G       C       225.417 .       DP=44;VDB=0.219325;SGB=-0.693146;MQSBZ=0;MQ0F=0;AC=1;AN=1;DP4=0,0,18,26;MQ=60   GT:PL   1:255,0
OP597571.1      741     .       AGGGG   AGG     17.0828 .       INDEL;IDV=18;IMF=0.367347;DP=49;VDB=0.73002;SGB=-0.683931;RPBZ=-0.778151;MQBZ=0.124761;MQSBZ=-0.650785;BQBZ=1.02751;SCBZ=0.358316;MQ0F=0;AC=1;AN=1;DP4=14,17,8,5;MQ=58 GT:PL   1:87,42
OP597571.1      747     .       A       G       149.305 .       DP=45;VDB=0.841016;SGB=-0.693145;RPBZ=1.48153;MQBZ=1.82034;MQSBZ=0;BQBZ=2.48105;SCBZ=0.918559;MQ0F=0;AC=1;AN=1;DP4=4,1,19,21;MQ=58     GT:PL   1:176,0
OP597571.1      888     .       T       C       228.415 .       DP=57;VDB=0.0352829;SGB=-0.693147;RPBZ=0.304058;MQBZ=-0.190662;MQSBZ=0.201778;BQBZ=1.7129;SCBZ=0.309711;MQ0F=0;AC=1;AN=1;DP4=1,0,24,32;MQ=58   GT:PL   1:255,0
OP597571.1      1335    .       G       A       228.424 .       DP=42;VDB=0.713753;SGB=-0.693145;RPBZ=0.0412661;MQBZ=-0.156174;MQSBZ=-0.866025;BQBZ=1.69535;SCBZ=0.481876;MQ0F=0;AC=1;AN=1;DP4=1,0,17,24;MQ=59 GT:PL   1:255,0
OP597571.1      1469    .       A       G       221.38  .       DP=52;VDB=0.389159;SGB=-0.693147;RPBZ=0.667379;MQBZ=-0.353341;MQSBZ=-1.08356;BQBZ=2.5377;SCBZ=-1.22412;MQ0F=0;AC=1;AN=1;DP4=2,1,17,32;MQ=59    GT:PL   1:248,0
OP597571.1      1566    .       G       A       228.393 .       DP=55;VDB=0.00601305;SGB=-0.693147;RPBZ=0.899577;MQBZ=-0.498471;MQSBZ=-1.09919;BQBZ=1.93931;SCBZ=-1.1883;MQ0F=0;AC=1;AN=1;DP4=1,1,19,34;MQ=56  GT:PL   1:255,0
OP597571.1      1677    .       T       C       203.339 .       DP=55;VDB=0.0325114;SGB=-0.693147;RPBZ=-0.145875;MQBZ=-0.280056;MQSBZ=-0.785905;BQBZ=1.93259;SCBZ=-0.687735;MQ0F=0;AC=1;AN=1;DP4=2,2,19,32;MQ=58       GT:PL   1:230,0
OP597571.1      1806    .       G       A       228.42  .       DP=55;VDB=0.0163287;SGB=-0.693147;RPBZ=0.346711;MQBZ=-0.136083;MQSBZ=-0.847791;BQBZ=1.67811;SCBZ=-1.38535;MQ0F=0;AC=1;AN=1;DP4=1,0,22,32;MQ=59 GT:PL   1:255,0
OP597571.1      2051    .       T       C       228.414 .       DP=185;VDB=0.258521;SGB=-0.693147;RPBZ=-0.659562;MQBZ=-0.811578;MQSBZ=-6.21909;BQBZ=2.58014;SCBZ=-0.128242;MQ0F=0;AC=1;AN=1;DP4=1,2,100,80;MQ=55       GT:PL   1:255,0
OP597571.1      2265    .       G       A       228.4   .       DP=802;VDB=0;SGB=-0.693147;RPBZ=-0.477099;MQBZ=2.06864;MQSBZ=3.44984;BQBZ=7.12028;SCBZ=0.475257;MQ0F=0;AC=1;AN=1;DP4=23,0,710,60;MQ=54 GT:PL   1:255,0

Clair3 command:

CLAIR_BIN_DIR=$(dirname $(which run_clair3.sh))
MODEL_PATH="$CLAIR_BIN_DIR/models/r941_prom_sup_g5014"

samtools faidx ref.fasta

run_clair3.sh \
    --bam_fn=minimap2.bam \
    --ref_fn=OP597571.1.fasta \
    --model_path="$MODEL_PATH"\
    --threads=1 \
    --platform="ont" \
    --output=clair3-out \
    --haploid_sensitive \
    --enable_long_indel \
    --keep_iupac_bases \
    --no_phasing_for_fa \
    --include_all_ctgs

Other details:

  • Clair3 version: 1.0.4 (installed from Bioconda)
  • read mapper: Minimap2 v2.24
  • instrument: GridION
  • instrument software: MinKNOW 23.07.12 (Bream 7.7.6, Core 5.7.5, Dorado 7.1.4+d7df870c0)
  • basecalling model: Super-accurate basecalling, 450 bps
  • flowcell: FLO-MIN106
  • kit: SQK-RBK110-96

Are there any parameters that need to be adjusted for variant calling of RNA virus sequence data?

Any help would be much appreciated!

Thanks!

@aquaskyline
Copy link
Member

Could you please try Clair3 with the --vcf_fn= option with the VCF called by bcftools. That forces Clair3 to do genotyping at site 2265. Please show the result at 2265 if it runs successfully.

@peterk87
Copy link
Author

Sure, here's the VCF output for each vcf.gz in the output dir:

merge_output.vcf.gz:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
OP597571.1      185     .       G       A       9.65    PASS    F       GT:GQ:DP:AD:AF:PL       1:9:143:2,107:0.7483:18,13,0
OP597571.1      252     .       C       A       25.12   PASS    P       GT:GQ:DP:AD:AF:PL       1:25:90:1,84:0.9333:80,58,0
OP597571.1      510     .       C       T       33.42   PASS    F       GT:GQ:DP:AD:AF:PL       1:33:42:0,41:0.9762:79,57,0
OP597571.1      528     .       A       C       35.38   PASS    F       GT:GQ:DP:AD:AF:PL       1:35:44:0,41:0.9318:80,63,0
OP597571.1      540     .       G       C       26.34   PASS    P       GT:GQ:DP:AD:AF:PL       1:26:45:0,44:0.9778:80,54,0
OP597571.1      741     .       A       .       23.27   RefCall P       GT:GQ:DP:AD:AF:PL       0:23:49:30:0.6122:990
OP597571.1      747     .       A       G       24.68   PASS    F       GT:GQ:DP:AD:AF:PL       1:24:47:4,40:0.8511:74,37,0
OP597571.1      888     .       T       C       24.92   PASS    P       GT:GQ:DP:AD:AF:PL       1:24:58:1,56:0.9655:80,42,0
OP597571.1      1335    .       G       A       28.63   PASS    F       GT:GQ:DP:AD:AF:PL       1:28:43:0,40:0.9302:77,45,0
OP597571.1      1469    .       A       G       30.17   PASS    F       GT:GQ:DP:AD:AF:PL       1:30:52:2,49:0.9423:80,48,0
OP597571.1      1566    .       G       A       32.47   PASS    F       GT:GQ:DP:AD:AF:PL       1:32:54:2,52:0.9630:80,53,0
OP597571.1      1677    .       T       C       31.74   PASS    F       GT:GQ:DP:AD:AF:PL       1:31:57:4,49:0.8596:80,52,0
OP597571.1      1806    .       G       A       27.10   PASS    P       GT:GQ:DP:AD:AF:PL       1:27:58:0,54:0.9310:80,47,0
OP597571.1      2051    .       T       C       26.04   PASS    P       GT:GQ:DP:AD:AF:PL       1:26:168:1,156:0.9286:80,41,0
OP597571.1      2265    .       G       .       .       .       .       GT      ./.

full_alignment.vcf.gz:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
OP597571.1      185     .       G       A       9.65    PASS    F       GT:GQ:DP:AD:AF:PL       1/1:9:143:2,107:0.7483:18,13,0
OP597571.1      510     .       C       T       33.42   PASS    F       GT:GQ:DP:AD:AF:PL       1/1:33:42:0,41:0.9762:79,57,0
OP597571.1      528     .       A       C       35.38   PASS    F       GT:GQ:DP:AD:AF:PL       1/1:35:44:0,41:0.9318:80,63,0
OP597571.1      747     .       A       G       24.68   PASS    F       GT:GQ:DP:AD:AF:PL       1/1:24:47:4,40:0.8511:74,37,0
OP597571.1      1335    .       G       A       28.63   PASS    F       GT:GQ:DP:AD:AF:PL       1/1:28:43:0,40:0.9302:77,45,0
OP597571.1      1469    .       A       G       30.17   PASS    F       GT:GQ:DP:AD:AF:PL       1/1:30:52:2,49:0.9423:80,48,0
OP597571.1      1566    .       G       A       32.47   PASS    F       GT:GQ:DP:AD:AF:PL       1/1:32:54:2,52:0.9630:80,53,0
OP597571.1      1677    .       T       C       31.74   PASS    F       GT:GQ:DP:AD:AF:PL       1/1:31:57:4,49:0.8596:80,52,0

pileup.vcf.gz:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
OP597571.1      185     .       G       A       19.96   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:19:143:2,107:0.7483:60,28,0
OP597571.1      252     .       C       A       25.12   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:25:90:1,84:0.9333:80,58,0
OP597571.1      510     .       C       T       21.82   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:21:42:0,41:0.9762:79,31,0
OP597571.1      528     .       A       C       24.57   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:24:44:0,41:0.9318:80,43,0
OP597571.1      540     .       G       C       26.34   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:26:45:0,44:0.9778:80,54,0
OP597571.1      741     .       A       .       23.27   RefCall P       GT:GQ:DP:AD:AF:PL       0/0:23:49:30:0.6122:990
OP597571.1      747     .       A       G       20.39   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:20:47:4,40:0.8511:80,28,0
OP597571.1      888     .       T       C       24.92   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:24:58:1,56:0.9655:80,42,0
OP597571.1      1335    .       G       A       22.79   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:22:43:0,40:0.9302:80,32,0
OP597571.1      1469    .       A       G       24.83   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:24:52:2,49:0.9423:80,37,0
OP597571.1      1566    .       G       A       22.71   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:22:54:2,52:0.9630:80,32,0
OP597571.1      1677    .       T       C       24.33   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:24:57:4,49:0.8596:80,38,0
OP597571.1      1806    .       G       A       27.10   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:27:58:0,54:0.9310:80,47,0
OP597571.1      2051    .       T       C       26.04   PASS    P       GT:GQ:DP:AD:AF:PL       1/1:26:168:1,156:0.9286:80,41,0

@aquaskyline
Copy link
Member

Clair3 was seeing no read at 2265. Clair3 filters the alignments with the following four flags:
read unmapped (0x4)
mate unmapped (0x8)
not primary alignment (0x100)
supplementary alignment (0x800)
Using samtools, the alignments can be filtered with samtools view -F 2316, and then piped into bcftools mpileup. Please test if 2265 can still be called using mpileup after applying the filters.

@peterk87
Copy link
Author

I filtered the alignments using samtools view -F 2316, however, the Bcftools results are mostly unchanged with only the depths and observed allele counts coming in a little lower.

Filtered BAM bcftools call results:

OP597571.1      2265    .       G       A       228.399 .       DP=759;VDB=0;SGB=-0.693147;RPBZ=-0.492301;MQBZ=1.61067;MQSBZ=3.01461;BQBZ=6.9304;SCBZ=0.520211;MQ0F=0;AC=1;AN=1;DP4=22,0,684,45;MQ=54   GT:PL   1:255,0

Unfiltered BAM bcftools call results:

OP597571.1      2265    .       G       A       228.4   .       DP=802;VDB=0;SGB=-0.693147;RPBZ=-0.477099;MQBZ=2.06864;MQSBZ=3.44984;BQBZ=7.12028;SCBZ=0.475257;MQ0F=0;AC=1;AN=1;DP4=23,0,710,60;MQ=54  GT:PL   1:255,0

Is there anything else that might be going on?

@aquaskyline
Copy link
Member

That's challenging. To make things easier, would you mind sending us a minibam with reads covering 2265.

@peterk87
Copy link
Author

Sure thing! Please find attached the samtools view filtered BAM:

filt.bam.gz

Let me know if you need anything else!

@aquaskyline
Copy link
Member

We found that 2265 is the last 16th base in your reference. Clair3 doesn't call variants in the first 16bp and last 16bp of a sequence because of 1) algorithmic limit, and 2) usually degenerated coverage and alignment performance in the head and tail of a sequence that makes variant calling unreliable. A solution to your case is to add say 10 'N' to the tail of your reference genome so 2265 gets out of the tail 16bp limit.

@peterk87
Copy link
Author

Thanks. This is very good to know.

For Influenza virus sequencing, there's a set of primers targeting the conserved 5' and 3' ends of each of the 8 genome segment generating amplicons that result in good sequencing coverage even at the ends of each segment.

@aquaskyline
Copy link
Member

Understood. Leaving this issue open and will come back later with a better solution.

@lagphase
Copy link

lagphase commented Mar 8, 2024

Clair3 was seeing no read at 2265. Clair3 filters the alignments with the following four flags: read unmapped (0x4) mate unmapped (0x8) not primary alignment (0x100) supplementary alignment (0x800) Using samtools, the alignments can be filtered with samtools view -F 2316, and then piped into bcftools mpileup. Please test if 2265 can still be called using mpileup after applying the filters.

Hello,

What is Clair3 filter criteria for supplementary alignments? I see mismatches in IGV that were not called as SNP by Clair3. I clicked on the reads and almost most of them are supplementary alignments.

clair3-

@zhengzhenxian
Copy link
Collaborator

@lagphase By default, Clair3 would discard all supplementary alignments. So you might disable the supplementary alignments in IGV to see if the variant is evident.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants