Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add ScanITD candidate calling #283

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

dawidkrzeciesa
Copy link

Added a new variant caller, ScanITD, to improve the detection of internal tandem duplications.

@@ -85,6 +85,12 @@ calling:
freebayes:
activate: true
# See https://varlociraptor.github.io/docs/calling/#generic-variant-calling
ScanITD:
activate: false
tmpDIR: "/tmp/ScanITD" # set directory for temporary files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be hidden from the user. jUST USE THE temp() functionality of Snakemake.

Suggested change
tmpDIR: "/tmp/ScanITD" # set directory for temporary files

input:
"results/recal/{sample}.bam",
output:
bam=temp(config["calling"]["ScanITD"]["tmpDIR"]+"/{sample}_chr{chr}.bam"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the tmpDIR thing. Just rely on Snakemake's temp mechanism

@@ -0,0 +1,564 @@
#!/usr/bin/env python3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather would want the modifications upstream and us use scanITD via a bioconda package.

threads: 2
shell:
"(bcftools reheader -f {input.ref_idx} {input.vcf} | bcftools sort --max-mem {resources.mem_mb}M | "
"bcftools view -e 'SVLEN > {params.itd_max_length_bp}' -Ob > {output}) 2> {log}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have this max length parameter? If it is needed, I would rather rely on a vembrane filter for this (i.e. leave it to the user, maybe with an example in the filter section of the config.yaml.


rule bcftools_gather_ScanITD:
input:
calls=expand("results/candidate-calls/ScanITD/{{sample}}_chr{chr}.ITD.bcf",chr=[str(i) for i in range(1, 23)] + ["X", "Y", "MT"]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded chromosomes are dangerous, as they will only work for human. The pipeline is species agnostic though. How long does scanITD take on a WGS sample? Is the splitting really necessary?

Comment on lines +1063 to +1065
def get_itd_regions(wildcards):
group=samples.loc[wildcards.sample]["group"]
return f"results/regions/{group}.expanded_regions.filtered.bed"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be replaced by
collect("results/regions/{group}.expanded_regions.filtered.bed", group=lookup(query="sample == '{sample}'", cols="group"))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which can then be done within the rule if it is only needed right there.

Comment on lines +1068 to +1075
def get_itd_bcfs(wildcards):
sample_names = samples.loc[samples["group"] == wildcards.group, "sample_name"].tolist()
return [f"results/candidate-calls/ScanITD/{x}.ITD.bcf" for x in sample_names]


def get_itd_bcfs_index(wildcards):
sample_names = samples.loc[samples["group"] == wildcards.group, "sample_name"].tolist()
return [f"results/candidate-calls/ScanITD/{x}.ITD.bcf.csi" for x in sample_names]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def get_itd_bcfs(idx=False):
    def inner(wildcards):
        return collect("results/candidate-calls/ScanITD/{sample}.ITD.bcf{suffix}", sample=sample=lookup(query="group == '{group}'", cols="sample_name"), suffix=".csi" if idx else "")

    return inner

@johanneskoester
Copy link
Contributor

Thanks a lot! Unfortunately, in addition to my comments there are also by now some conflicts. Please contact @FelixMoelder in case you need help with resolving them.

@johanneskoester johanneskoester changed the title ScanITD feat: add ScanITD candidate calling Apr 10, 2024
@FelixMoelder
Copy link
Contributor

Thanks a lot! Unfortunately, in addition to my comments there are also by now some conflicts. Please contact @FelixMoelder in case you need help with resolving them.

Should be resolved :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants