-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add ScanITD candidate calling #283
base: master
Are you sure you want to change the base?
Conversation
update to v5.0.1
bump to v5.1.0
@@ -85,6 +85,12 @@ calling: | |||
freebayes: | |||
activate: true | |||
# See https://varlociraptor.github.io/docs/calling/#generic-variant-calling | |||
ScanITD: | |||
activate: false | |||
tmpDIR: "/tmp/ScanITD" # set directory for temporary files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be hidden from the user. jUST USE THE temp()
functionality of Snakemake.
tmpDIR: "/tmp/ScanITD" # set directory for temporary files |
input: | ||
"results/recal/{sample}.bam", | ||
output: | ||
bam=temp(config["calling"]["ScanITD"]["tmpDIR"]+"/{sample}_chr{chr}.bam"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for the tmpDIR thing. Just rely on Snakemake's temp mechanism
@@ -0,0 +1,564 @@ | |||
#!/usr/bin/env python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather would want the modifications upstream and us use scanITD via a bioconda package.
threads: 2 | ||
shell: | ||
"(bcftools reheader -f {input.ref_idx} {input.vcf} | bcftools sort --max-mem {resources.mem_mb}M | " | ||
"bcftools view -e 'SVLEN > {params.itd_max_length_bp}' -Ob > {output}) 2> {log}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have this max length parameter? If it is needed, I would rather rely on a vembrane filter for this (i.e. leave it to the user, maybe with an example in the filter section of the config.yaml.
|
||
rule bcftools_gather_ScanITD: | ||
input: | ||
calls=expand("results/candidate-calls/ScanITD/{{sample}}_chr{chr}.ITD.bcf",chr=[str(i) for i in range(1, 23)] + ["X", "Y", "MT"]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded chromosomes are dangerous, as they will only work for human. The pipeline is species agnostic though. How long does scanITD take on a WGS sample? Is the splitting really necessary?
def get_itd_regions(wildcards): | ||
group=samples.loc[wildcards.sample]["group"] | ||
return f"results/regions/{group}.expanded_regions.filtered.bed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be replaced by
collect("results/regions/{group}.expanded_regions.filtered.bed", group=lookup(query="sample == '{sample}'", cols="group"))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which can then be done within the rule if it is only needed right there.
def get_itd_bcfs(wildcards): | ||
sample_names = samples.loc[samples["group"] == wildcards.group, "sample_name"].tolist() | ||
return [f"results/candidate-calls/ScanITD/{x}.ITD.bcf" for x in sample_names] | ||
|
||
|
||
def get_itd_bcfs_index(wildcards): | ||
sample_names = samples.loc[samples["group"] == wildcards.group, "sample_name"].tolist() | ||
return [f"results/candidate-calls/ScanITD/{x}.ITD.bcf.csi" for x in sample_names] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_itd_bcfs(idx=False):
def inner(wildcards):
return collect("results/candidate-calls/ScanITD/{sample}.ITD.bcf{suffix}", sample=sample=lookup(query="group == '{group}'", cols="sample_name"), suffix=".csi" if idx else "")
return inner
Thanks a lot! Unfortunately, in addition to my comments there are also by now some conflicts. Please contact @FelixMoelder in case you need help with resolving them. |
Should be resolved :) |
Added a new variant caller, ScanITD, to improve the detection of internal tandem duplications.