You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When determining duplicate status or selecting the best representative pair among a pool of duplicates, read pairs should always be considered together. Currently that is not always the case.
I am trying to pipe the output from umi-tools group to samtools markdup, taking advantage of the error correction of the former with the proper bitflag updating of the latter. However umi-tools only tags read 1 in a pair with the consensus UMI (in the BX tag). This causes samtools to apparently properly mark duplicates for read 1, but seeing no consensus UMI barcode tag in read 2, it marks duplicates based purely on position. Which produces data like the following:
Both read 1s are marked unique despite the identical positions, based on different values in the BX tag. But both read 2s lack the BX tag, so samtools marks one as unique and the other as a duplicate. Hence the second read pair is discordant, with read 1 marked as unique and read 2 marked as a duplicate.
It would be best if samtools handled this elegantly - i.e. if the reads have different tag values or one is missing the tag, samtools uses whichever is present or defaults to read 1 if both are present but differ - but barring this, it might be worth making explicit in the documentation that if a barcode tag is specified, both reads in a pair must have it (and have the same value) or else erratic behavior may result.
The text was updated successfully, but these errors were encountered:
Copied from #1710
Running samtools 1.19.2 on Linux x86_64
When determining duplicate status or selecting the best representative pair among a pool of duplicates, read pairs should always be considered together. Currently that is not always the case.
I am trying to pipe the output from umi-tools group to samtools markdup, taking advantage of the error correction of the former with the proper bitflag updating of the latter. However umi-tools only tags read 1 in a pair with the consensus UMI (in the BX tag). This causes samtools to apparently properly mark duplicates for read 1, but seeing no consensus UMI barcode tag in read 2, it marks duplicates based purely on position. Which produces data like the following:
Command:
$umitools group -I /dev/stdin --paired --output-bam --compresslevel 0 --extract-umi-method tag --umi-tag RX --umi-tag-delimiter=- --unmapped-reads output | $samtools markdup -@ 8 -m s -d 100 --duplicate-count --barcode-tag BX --write-index - $o.bam##idx##$o.bai
Both read 1s are marked unique despite the identical positions, based on different values in the BX tag. But both read 2s lack the BX tag, so samtools marks one as unique and the other as a duplicate. Hence the second read pair is discordant, with read 1 marked as unique and read 2 marked as a duplicate.
It would be best if samtools handled this elegantly - i.e. if the reads have different tag values or one is missing the tag, samtools uses whichever is present or defaults to read 1 if both are present but differ - but barring this, it might be worth making explicit in the documentation that if a barcode tag is specified, both reads in a pair must have it (and have the same value) or else erratic behavior may result.
The text was updated successfully, but these errors were encountered: