Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How is the false positive rate calculated in som.py stats? #187

Open
Krannich479 opened this issue Apr 25, 2024 · 0 comments
Open

How is the false positive rate calculated in som.py stats? #187

Krannich479 opened this issue Apr 25, 2024 · 0 comments

Comments

@Krannich479
Copy link

Dear hap.py developer team,
I have a question regarding the output of som.py.

  • Question:
    I ran som.py (v0.3.15) using a short variants callset and a ground truth set. The tool ran successfully and the results seem reasonable. However, at the end of each line within the <prefix>.sompy.stats.csv file I noticed a field fp.rate which made me wonder how exactly this is computed here?

  • Background:
    The false positive rate (FPR) is commonly defined as FP/(FP+TN). Hence, I presume TN is computed at some point. There exists a README page dedicated to som.py but the number of True Negatives (TN) is not defined there. The bioRxiv preprint of hap.py+som.py even has a paragraph on this stating that TN are not included due to a lack of a clear definition (with which I strongly agree!):

Note that we have chosen not to include
true negatives (or consequently specificity) in our standardized definitions. This is due to the
challenge in defining the number of true negatives, particularly around complex variants. In
addition, precision is often a more useful metric than specificity due to the very large proportion
of true negative positions in the genome.

  • Example:
    Here is an example of my output. There is a non-zero FPR at the end of the line.
idx  type     total.truth  total.query  tp   fp  fn  unk  ambi  recall              recall_lower        recall_upper        recall2             precision           precision_lower     precision_upper     na   ambiguous  fp.region.size  fp.rate             sompyversion  sompycmd
0    indels   180          153          151  2   29  0    0     0.8388888888888889  0.7799816161378756  0.8870047190333543  0.8388888888888889  0.9869281045751634  0.9587317223755603  0.997273934669216   0.0  0.0        29903           66.88292144600877   som.py-       /<path>/bin/som.py --no-fixchr-truth --no-fixchr-query --normalize-all -r <path>/<reference>.fasta -o <prefix>.sompy <truthset>.vcf <callset>.vcf.gz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant