We hold out HG003
sample while training PEPPER-Margin-DeepVariant
so we use HG003
to demonstrate our whole genome performance. We report both runtime and accuracy for this evaluation between PEPPER-Margin-DeepVariant r0.7 and r0.8.
We used the following dataset:
Sample: HG003 (Whole genome)
Coverage: ~85x
Chemistry: R9.4.1
Basecaller: Guppy 5.0.7 "Sup"
We downsampled the ~85x
variant calling data using the following command:
samtools view -s 0.71 -b -@${THREADS} HG003_guppy_507_2_GRCh38_pass.bam > HG003_guppy_507_2_GRCh38_pass.60x.bam
samtools view -s 0.36 -b -@${THREADS} HG003_guppy_507_2_GRCh38_pass.bam > HG003_guppy_507_2_GRCh38_pass.30x.bam
time docker run -it -v /data:/data \
-u `id -u`:`id -g` \
kishwars/pepper_deepvariant:r0.8 \
run_pepper_margin_deepvariant call_variant \
-b $BAM \
-f $REF \
-o $OUTPUT_DIR \
-t $THREADS \
-s HG003 \
--ont_r9_guppy5_sup 2>&1 | tee $LOG_FILE
In all stratified coverages (30x, 60x, 85x)
, PEPPER-Margin-DeepVariant r0.8 shows increased accuracy:
Sample | Version | Type | Truth total | True positives | False negatives | False positives | Recall | Precision | F1-Score |
---|---|---|---|---|---|---|---|---|---|
HG003 30x | r0.7 | INDEL | 504501 | 317621 | 186880 | 35084 | 0.629575 | 0.902714 | 0.7418 |
SNP | 3327495 | 3310002 | 17493 | 11986 | 0.994743 | 0.996393 | 0.995567 | ||
r0.8 | INDEL | 504501 | 345384 | 159117 | 51842 | 0.684605 | 0.872481 | 0.767209 | |
SNP | 3327495 | 3309038 | 18457 | 9173 | 0.994453 | 0.997236 | 0.995843 |
Sample | Version | Type | Truth total | True positives | False negatives | False positives | Recall | Precision | F1-Score |
---|---|---|---|---|---|---|---|---|---|
HG003 60x | r0.7 | INDEL | 504501 | 366144 | 138357 | 33484 | 0.725755 | 0.91827 | 0.810741 |
SNP | 3327495 | 3317492 | 10003 | 8548 | 0.996994 | 0.99743 | 0.997212 | ||
r0.8 | INDEL | 504501 | 394987 | 109514 | 44678 | 0.782926 | 0.90091 | 0.837785 | |
SNP | 3327495 | 3317515 | 9980 | 7120 | 0.997001 | 0.997859 | 0.99743 |
Sample | Version | Type | Truth total | True positives | False negatives | False positives | Recall | Precision | F1-Score |
---|---|---|---|---|---|---|---|---|---|
HG003 85x | r0.7 | INDEL | 504501 | 383384 | 121117 | 30595 | 0.759927 | 0.927982 | 0.835588 |
SNP | 3327495 | 3318437 | 9058 | 8032 | 0.997278 | 0.997586 | 0.997432 | ||
r0.8 | INDEL | 504501 | 412169 | 92332 | 38633 | 0.816984 | 0.91651 | 0.86389 | |
SNP | 3327495 | 3318308 | 9187 | 6733 | 0.997239 | 0.997976 | 0.997607 |