Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROC and PR curve #179

Open
ESDeutekom opened this issue Aug 30, 2023 · 2 comments
Open

ROC and PR curve #179

ESDeutekom opened this issue Aug 30, 2023 · 2 comments

Comments

@ESDeutekom
Copy link

ESDeutekom commented Aug 30, 2023

Dear @pkrusche and team,

I have done an analysis with Deepvariant variants called from a genome in a bottle sample. I did the analysis with hap.py with a singularity pulled docker taken from docker://pkrusche/hap.py on the benchmark giab vcf.

I used the following command (in snakemake rule):

shell: "export HGREF={input.ref_genome}; /opt/hap.py/bin/hap.py {input.truth_vcf} {input.query_vcf} --false-positives {input.confidence_bed} --target-regions {input.target_bed} -r {input.ref_genome} --roc QUAL --roc-filter RefCall -o {params.prefix} -V --engine=vcfeval --engine-vcfeval-template {input.ref_sdf} --threads {threads} --logfile {log}"

I am however confused by the results. I added the option --roc, because this is the only option I could find (not a pr curve option?). However, I found in the documents that precision and recall are calculated, this is also what I see as column names in the output (see first two rows and header) and not roc metrics:

Type | Subtype | Subset | Filter | Genotype | QQ.Field | QQ | METRIC.Recall | METRIC.Precision | ...
INDEL | * | TS_contained | ALL | * | QUAL | 65.300003 | 0.0 | 1.0 | ...
INDEL | * | TS_contained | SEL | * | QUAL | 65.300003 | 0.0 | 1.0 | ...

How is it possible to have a Recall of 0 and Precision of 1? Unless this is just wrongly labelled metrics and should be TPR and FPR and it is supposed to be a ROC plot? Like the flag says. The plot I made also looks like it should be a ROC.

Additionally, if I plot the METRIC.Recall and METRIC.Precision from the roc files, I get a plot that follows a typical ROC form, while if I plot the values as also calculated in happy.md, I get a different plot and one that does look more like a PR curve:
Recall = TRUTH.TP / (TRUTH.TP + TRUTH.FN)
Precision = QUERY.TP / (QUERY.TP + QUERY.FP)

image

Thank you in advance,
Eva

@ivargr
Copy link

ivargr commented Oct 24, 2023

+1 I'm also confused about the same.

@ryancey1
Copy link

ryancey1 commented Dec 5, 2023

+2 Also confused

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants