feat: add ScanITD candidate calling #283

dawidkrzeciesa · 2024-02-14T14:12:12Z

Added a new variant caller, ScanITD, to improve the detection of internal tandem duplications.

update to v5.0.1

bump to v5.1.0

johanneskoester · 2024-04-10T10:16:37Z

config/config.yaml

@@ -85,6 +85,12 @@ calling:
  freebayes:
    activate: true
  # See https://varlociraptor.github.io/docs/calling/#generic-variant-calling
+  ScanITD: 
+    activate: false
+    tmpDIR: "/tmp/ScanITD"  # set directory for temporary files


This should be hidden from the user. jUST USE THE temp() functionality of Snakemake.

Suggested change

tmpDIR: "/tmp/ScanITD" # set directory for temporary files

johanneskoester · 2024-04-10T10:18:52Z

workflow/rules/candidate_calling.smk

+    input:
+        "results/recal/{sample}.bam",
+    output:
+        bam=temp(config["calling"]["ScanITD"]["tmpDIR"]+"/{sample}_chr{chr}.bam"),


No need for the tmpDIR thing. Just rely on Snakemake's temp mechanism

johanneskoester · 2024-04-10T10:19:58Z

workflow/scripts/ScanITD.py

@@ -0,0 +1,564 @@
+#!/usr/bin/env python3


I'd rather would want the modifications upstream and us use scanITD via a bioconda package.

johanneskoester · 2024-04-10T10:21:28Z

workflow/rules/candidate_calling.smk

+    threads: 2
+    shell:
+        "(bcftools reheader -f {input.ref_idx} {input.vcf} | bcftools sort --max-mem {resources.mem_mb}M | "
+        "bcftools view -e 'SVLEN > {params.itd_max_length_bp}' -Ob > {output}) 2> {log}"


Why do we have this max length parameter? If it is needed, I would rather rely on a vembrane filter for this (i.e. leave it to the user, maybe with an example in the filter section of the config.yaml.

johanneskoester · 2024-04-10T10:23:14Z

workflow/rules/candidate_calling.smk

+
+rule bcftools_gather_ScanITD:
+    input:
+        calls=expand("results/candidate-calls/ScanITD/{{sample}}_chr{chr}.ITD.bcf",chr=[str(i) for i in range(1, 23)] + ["X", "Y", "MT"]),


The hardcoded chromosomes are dangerous, as they will only work for human. The pipeline is species agnostic though. How long does scanITD take on a WGS sample? Is the splitting really necessary?

johanneskoester · 2024-04-10T10:26:50Z

workflow/rules/common.smk

+def get_itd_regions(wildcards):
+    group=samples.loc[wildcards.sample]["group"]
+    return  f"results/regions/{group}.expanded_regions.filtered.bed"


This can be replaced by
collect("results/regions/{group}.expanded_regions.filtered.bed", group=lookup(query="sample == '{sample}'", cols="group"))

Which can then be done within the rule if it is only needed right there.

johanneskoester · 2024-04-10T10:31:48Z

workflow/rules/common.smk

+def get_itd_bcfs(wildcards):
+    sample_names = samples.loc[samples["group"] == wildcards.group, "sample_name"].tolist()
+    return [f"results/candidate-calls/ScanITD/{x}.ITD.bcf" for x in sample_names]
+
+
+def get_itd_bcfs_index(wildcards):
+    sample_names = samples.loc[samples["group"] == wildcards.group, "sample_name"].tolist()
+    return [f"results/candidate-calls/ScanITD/{x}.ITD.bcf.csi" for x in sample_names]


def get_itd_bcfs(idx=False): def inner(wildcards): return collect("results/candidate-calls/ScanITD/{sample}.ITD.bcf{suffix}", sample=sample=lookup(query="group == '{group}'", cols="sample_name"), suffix=".csi" if idx else "") return inner

johanneskoester · 2024-04-10T10:33:02Z

Thanks a lot! Unfortunately, in addition to my comments there are also by now some conflicts. Please contact @FelixMoelder in case you need help with resolving them.

FelixMoelder · 2024-04-10T11:11:30Z

Thanks a lot! Unfortunately, in addition to my comments there are also by now some conflicts. Please contact @FelixMoelder in case you need help with resolving them.

Should be resolved :)

dawidkrzeciesa and others added 11 commits August 23, 2023 11:09

add ScanITD

e8f9871

Merge pull request #1 from snakemake-workflows/master

b7494e8

update to v5.0.1

fix ScanITD

0a291c0

ScanITD regions

87c1f55

call ITDs in all samples

727f1c8

get_itd_regions

4c76bc7

fix multiallelic records

273553f

Merge pull request #2 from snakemake-workflows/master

6c3ae11

bump to v5.1.0

ScanITD <- split bam and add tmpDIR

9942416

change contamination in scenario.yaml and configure temp files

8911bce

set default ScanITD candidate calling to false and cleanup

05e139a

johanneskoester requested changes Apr 10, 2024

View reviewed changes

johanneskoester changed the title ~~ScanITD~~ feat: add ScanITD candidate calling Apr 10, 2024

Merge branch 'master' into itd

f754220

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add ScanITD candidate calling #283

feat: add ScanITD candidate calling #283

dawidkrzeciesa commented Feb 14, 2024

johanneskoester Apr 10, 2024

johanneskoester Apr 10, 2024

johanneskoester Apr 10, 2024

johanneskoester Apr 10, 2024

johanneskoester Apr 10, 2024

johanneskoester Apr 10, 2024

johanneskoester Apr 10, 2024

johanneskoester Apr 10, 2024

johanneskoester commented Apr 10, 2024

FelixMoelder commented Apr 10, 2024

feat: add ScanITD candidate calling #283

Are you sure you want to change the base?

feat: add ScanITD candidate calling #283

Conversation

dawidkrzeciesa commented Feb 14, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johanneskoester commented Apr 10, 2024

FelixMoelder commented Apr 10, 2024