Skip to content

Commit

Permalink
fix: fixed GATK3 conda channel priorities and code reformat (#534)
Browse files Browse the repository at this point in the history
<!-- Ensure that the PR title follows conventional commit style (<type>:
<description>)-->
<!-- Possible types are here:
https://github.com/commitizen/conventional-commit-types/blob/master/index.json
-->

### Description

<!-- Add a description of your PR here-->
Fixed conda channel priorities, plus some code clean-up

### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] I confirm that:

For all wrappers added by this PR, 

* there is a test case which covers any introduced changes,
* `input:` and `output:` file paths in the resulting rule can be changed
arbitrarily,
* either the wrapper can only use a single core, or the example rule
contains a `threads: x` statement with `x` being a reasonable default,
* rule names in the test case are in
[snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell
what the rule is about or match the tools purpose or name (e.g.,
`map_reads` for a step that maps reads),
* all `environment.yaml` specifications follow [the respective best
practices](https://stackoverflow.com/a/64594513/2352071),
* wherever possible, command line arguments are inferred and set
automatically (e.g. based on file extensions in `input:` or `output:`),
* all fields of the example rules in the `Snakefile`s and their entries
are explained via comments (`input:`/`output:`/`params:` etc.),
* `stderr` and/or `stdout` are logged correctly (`log:`), depending on
the wrapped tool,
* temporary files are either written to a unique hidden folder in the
working directory, or (better) stored where the Python function
`tempfile.gettempdir()` points to (see
[here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir);
this also means that using any Python `tempfile` default behavior
works),
* the `meta.yaml` contains a link to the documentation of the respective
tool or command,
* `Snakefile`s pass the linting (`snakemake --lint`),
* `Snakefile`s are formatted with
[snakefmt](https://github.com/snakemake/snakefmt),
* Python wrapper scripts are formatted with
[black](https://black.readthedocs.io).

Co-authored-by: Johannes Köster <johannes.koester@uni-due.de>
  • Loading branch information
fgvieira and johanneskoester committed Oct 31, 2022
1 parent c4ee12f commit 43e5a16
Show file tree
Hide file tree
Showing 65 changed files with 366 additions and 121 deletions.
4 changes: 2 additions & 2 deletions bio/gatk/applybqsr/environment.yaml
Expand Up @@ -3,7 +3,7 @@ channels:
- bioconda
- nodefaults
dependencies:
- gatk4 =4.2
- gatk4 =4.3.0.0
- openjdk =8
- snakemake-wrapper-utils =0.3
- snakemake-wrapper-utils =0.5
- samtools =1.16
2 changes: 1 addition & 1 deletion bio/gatk/applyvqsr/environment.yaml
Expand Up @@ -4,4 +4,4 @@ channels:
- nodefaults
dependencies:
- gatk4 =4.2
- snakemake-wrapper-utils =0.3
- snakemake-wrapper-utils =0.5
3 changes: 1 addition & 2 deletions bio/gatk/baserecalibrator/environment.yaml
Expand Up @@ -4,5 +4,4 @@ channels:
- nodefaults
dependencies:
- gatk4 =4.2
- openjdk =8
- snakemake-wrapper-utils =0.3
- snakemake-wrapper-utils =0.5
3 changes: 1 addition & 2 deletions bio/gatk/baserecalibratorspark/environment.yaml
Expand Up @@ -4,5 +4,4 @@ channels:
- nodefaults
dependencies:
- gatk4 =4.2
- openjdk =8
- snakemake-wrapper-utils =0.3
- snakemake-wrapper-utils =0.5
2 changes: 1 addition & 1 deletion bio/gatk/cleansam/environment.yaml
Expand Up @@ -4,4 +4,4 @@ channels:
- nodefaults
dependencies:
- gatk4 =4.2
- snakemake-wrapper-utils =0.3
- snakemake-wrapper-utils =0.5
2 changes: 1 addition & 1 deletion bio/gatk/combinegvcfs/environment.yaml
Expand Up @@ -4,4 +4,4 @@ channels:
- nodefaults
dependencies:
- gatk4 =4.2
- snakemake-wrapper-utils =0.3
- snakemake-wrapper-utils =0.5
2 changes: 1 addition & 1 deletion bio/gatk/depthofcoverage/environment.yaml
Expand Up @@ -4,4 +4,4 @@ channels:
- nodefaults
dependencies:
- gatk4 =4.2
- snakemake-wrapper-utils =0.4
- snakemake-wrapper-utils =0.5
4 changes: 2 additions & 2 deletions bio/gatk/estimatelibrarycomplexity/environment.yaml
Expand Up @@ -3,5 +3,5 @@ channels:
- bioconda
- nodefaults
dependencies:
- gatk4 =4.3.0.0
- snakemake-wrapper-utils =0.5.0
- gatk4 =4.2
- snakemake-wrapper-utils =0.5
2 changes: 1 addition & 1 deletion bio/gatk/filtermutectcalls/environment.yaml
Expand Up @@ -4,4 +4,4 @@ channels:
- nodefaults
dependencies:
- gatk4 =4.2
- snakemake-wrapper-utils =0.3
- snakemake-wrapper-utils =0.5
2 changes: 1 addition & 1 deletion bio/gatk/genomicsdbimport/environment.yaml
Expand Up @@ -4,4 +4,4 @@ channels:
- nodefaults
dependencies:
- gatk4 =4.2
- snakemake-wrapper-utils =0.3
- snakemake-wrapper-utils =0.5
2 changes: 1 addition & 1 deletion bio/gatk/genotypegvcfs/environment.yaml
Expand Up @@ -4,4 +4,4 @@ channels:
- nodefaults
dependencies:
- gatk4 =4.2
- snakemake-wrapper-utils =0.3
- snakemake-wrapper-utils =0.5
2 changes: 1 addition & 1 deletion bio/gatk/getpileupsummaries/environment.yaml
Expand Up @@ -4,4 +4,4 @@ channels:
- nodefaults
dependencies:
- gatk4 =4.2
- snakemake-wrapper-utils =0.3
- snakemake-wrapper-utils =0.5
2 changes: 1 addition & 1 deletion bio/gatk/learnreadorientationmodel/environment.yaml
Expand Up @@ -4,4 +4,4 @@ channels:
- nodefaults
dependencies:
- gatk4 =4.2
- snakemake-wrapper-utils =0.3
- snakemake-wrapper-utils =0.5
2 changes: 1 addition & 1 deletion bio/gatk/markduplicatesspark/environment.yaml
Expand Up @@ -5,4 +5,4 @@ channels:
dependencies:
- gatk4 =4.2
- openjdk =8
- snakemake-wrapper-utils =0.3
- snakemake-wrapper-utils =0.5
2 changes: 1 addition & 1 deletion bio/gatk/mutect/environment.yaml
Expand Up @@ -4,4 +4,4 @@ channels:
- nodefaults
dependencies:
- gatk4 =4.2
- snakemake-wrapper-utils =0.3
- snakemake-wrapper-utils =0.5
2 changes: 1 addition & 1 deletion bio/gatk/selectvariants/environment.yaml
Expand Up @@ -4,4 +4,4 @@ channels:
- nodefaults
dependencies:
- gatk4 =4.2
- snakemake-wrapper-utils =0.3
- snakemake-wrapper-utils =0.5
2 changes: 1 addition & 1 deletion bio/gatk/splitncigarreads/environment.yaml
Expand Up @@ -4,4 +4,4 @@ channels:
- nodefaults
dependencies:
- gatk4 =4.2
- snakemake-wrapper-utils =0.3
- snakemake-wrapper-utils =0.5
2 changes: 1 addition & 1 deletion bio/gatk/varianteval/environment.yaml
Expand Up @@ -4,4 +4,4 @@ channels:
- nodefaults
dependencies:
- gatk4 =4.2
- snakemake-wrapper-utils =0.3
- snakemake-wrapper-utils =0.5
2 changes: 1 addition & 1 deletion bio/gatk/variantfiltration/environment.yaml
Expand Up @@ -4,4 +4,4 @@ channels:
- nodefaults
dependencies:
- gatk4 =4.2
- snakemake-wrapper-utils =0.3
- snakemake-wrapper-utils =0.5
2 changes: 1 addition & 1 deletion bio/gatk/variantrecalibrator/environment.yaml
Expand Up @@ -4,6 +4,6 @@ channels:
- nodefaults
dependencies:
- gatk4 =4.2
- snakemake-wrapper-utils =0.3
- snakemake-wrapper-utils =0.5
- google-cloud-sdk
- google-crc32c
5 changes: 3 additions & 2 deletions bio/gatk3/baserecalibrator/environment.yaml
Expand Up @@ -3,5 +3,6 @@ channels:
- bioconda
- nodefaults
dependencies:
- gatk ==3.8
- snakemake-wrapper-utils ==0.1.3
- gatk =3.8
- python >=3.10
- snakemake-wrapper-utils =0.5
3 changes: 0 additions & 3 deletions bio/gatk3/baserecalibrator/test/README.md

This file was deleted.

18 changes: 11 additions & 7 deletions bio/gatk3/baserecalibrator/test/Snakefile
@@ -1,16 +1,20 @@
rule baserecalibrator:
input:
bam="mapped/{sample}.bam",
bam="{sample}.bam",
bai="{sample}.bai",
ref="genome.fasta",
known="dbsnp.vcf.gz"
fai="genome.fasta.fai",
dict="genome.dict",
known="dbsnp.vcf.gz",
known_idx="dbsnp.vcf.gz.tbi",
output:
"{sample}.recal_data_table"
recal_table="{sample}.recal_data_table",
log:
"logs/gatk3/bqsr/{sample}.log"
"logs/gatk3/bqsr/{sample}.log",
params:
extra="" # optional
extra="--defaultBaseQualities 20 --filter_reads_with_N_cigar", # optional
resources:
mem_mb = 1024
mem_mb=1024,
threads: 16
wrapper:
"bio/gatk/baserecalibrator"
"master/bio/gatk3/baserecalibrator"
Binary file added bio/gatk3/baserecalibrator/test/a.bai
Binary file not shown.
Binary file added bio/gatk3/baserecalibrator/test/a.bam
Binary file not shown.
Binary file added bio/gatk3/baserecalibrator/test/dbsnp.vcf.gz
Binary file not shown.
Binary file added bio/gatk3/baserecalibrator/test/dbsnp.vcf.gz.tbi
Binary file not shown.
3 changes: 3 additions & 0 deletions bio/gatk3/baserecalibrator/test/genome.dict
@@ -0,0 +1,3 @@
@HD VN:1.5
@SQ SN:ref LN:45 M5:7a66cae8ab14aef8d635bc80649e730b UR:file:/home/johannes/scms/snakemake-wrappers/bio/picard/createsequencedictionary/test/genome.fasta
@SQ SN:ref2 LN:40 M5:1636753510ec27476fdd109a6684680e UR:file:/home/johannes/scms/snakemake-wrappers/bio/picard/createsequencedictionary/test/genome.fasta
4 changes: 4 additions & 0 deletions bio/gatk3/baserecalibrator/test/genome.fasta
@@ -0,0 +1,4 @@
>ref
AGCATGTTAGATAAGATAGCTGTGCTAGTAGGCAGTCAGCGCCAT
>ref2
aggttttataaaacaattaagtctacagagcaactacgcg
2 changes: 2 additions & 0 deletions bio/gatk3/baserecalibrator/test/genome.fasta.fai
@@ -0,0 +1,2 @@
ref 45 5 45 46
ref2 40 57 40 41
38 changes: 19 additions & 19 deletions bio/gatk3/baserecalibrator/wrapper.py
Expand Up @@ -9,31 +9,31 @@
from snakemake_wrapper_utils.java import get_java_opts

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
java_opts = get_java_opts(snakemake)

input_bam = snakemake.input.bam
input_known = snakemake.input.known
input_ref = snakemake.input.ref
bed = snakemake.params.get("bed", None)
if bed is not None:
bed = "-L " + bed
else:
bed = ""

input_known_string = ""
for known in input_known:
input_known_string = input_known_string + " --knownSites {}".format(known)
bed = snakemake.params.get("bed", "")
if bed:
bed = f"--intervals {bed}"


input_known = snakemake.input.get("known", "")
if input_known:
if isinstance(input_known, str):
input_known = [input_known]
input_known = list(map("--knownSites {}".format, input_known))

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
"gatk3 {java_opts} -T BaseRecalibrator"
" -nct {snakemake.threads}"
" {extra}"
" -I {input_bam}"
" -R {input_ref}"
" {input_known_string}"
"gatk3 {java_opts}"
" --analysis_type BaseRecalibrator"
" --num_cpu_threads_per_data_thread {snakemake.threads}"
" --input_file {snakemake.input.bam}"
" {input_known}"
" --reference_sequence {snakemake.input.ref}"
" {bed}"
" -o {snakemake.output}"
" {extra}"
" --out {snakemake.output}"
" {log}"
)
5 changes: 3 additions & 2 deletions bio/gatk3/indelrealigner/environment.yaml
Expand Up @@ -3,5 +3,6 @@ channels:
- bioconda
- nodefaults
dependencies:
- gatk ==3.8
- snakemake-wrapper-utils ==0.1.3
- gatk =3.8
- python >=3.10
- snakemake-wrapper-utils =0.5
3 changes: 0 additions & 3 deletions bio/gatk3/indelrealigner/test/README.md

This file was deleted.

21 changes: 11 additions & 10 deletions bio/gatk3/indelrealigner/test/Snakefile
@@ -1,21 +1,22 @@
rule indelrealigner:
input:
bam="mapped/{sample}.bam",
bai="mapped/{sample}.bai",
bam="{sample}.bam",
bai="{sample}.bai",
ref="genome.fasta",
fai="genome.fasta.fai",
dict="genome.dict",
known="dbsnp.vcf.gz",
known_idx="dbsnp.vcf.gz.tbi",
target_intervals="{sample}.intervals"
target_intervals="{sample}.intervals",
output:
bam="realigned/{sample}.bam",
bai="realigned/{sample}.bai",
java_temp=temp(directory("/tmp/gatk3_indelrealigner/{sample}")),
bam="{sample}.realigned.bam",
bai="{sample}.realigned.bai",
log:
"logs/gatk3/indelrealigner/{sample}.log"
"logs/gatk3/indelrealigner/{sample}.log",
params:
extra="" # optional
extra="--defaultBaseQualities 20 --filter_reads_with_N_cigar", # optional
threads: 16
resources:
mem_mb = 1024
mem_mb=1024,
wrapper:
"bio/gatk/indelrealigner"
"master/bio/gatk3/indelrealigner"
Binary file added bio/gatk3/indelrealigner/test/a.bai
Binary file not shown.
Binary file added bio/gatk3/indelrealigner/test/a.bam
Binary file not shown.
2 changes: 2 additions & 0 deletions bio/gatk3/indelrealigner/test/a.intervals
@@ -0,0 +1,2 @@
ref:14-19
ref2:14-15
Binary file added bio/gatk3/indelrealigner/test/dbsnp.vcf.gz
Binary file not shown.
Binary file added bio/gatk3/indelrealigner/test/dbsnp.vcf.gz.tbi
Binary file not shown.
3 changes: 3 additions & 0 deletions bio/gatk3/indelrealigner/test/genome.dict
@@ -0,0 +1,3 @@
@HD VN:1.5
@SQ SN:ref LN:45 M5:7a66cae8ab14aef8d635bc80649e730b UR:file:/home/johannes/scms/snakemake-wrappers/bio/picard/createsequencedictionary/test/genome.fasta
@SQ SN:ref2 LN:40 M5:1636753510ec27476fdd109a6684680e UR:file:/home/johannes/scms/snakemake-wrappers/bio/picard/createsequencedictionary/test/genome.fasta
4 changes: 4 additions & 0 deletions bio/gatk3/indelrealigner/test/genome.fasta
@@ -0,0 +1,4 @@
>ref
AGCATGTTAGATAAGATAGCTGTGCTAGTAGGCAGTCAGCGCCAT
>ref2
aggttttataaaacaattaagtctacagagcaactacgcg
2 changes: 2 additions & 0 deletions bio/gatk3/indelrealigner/test/genome.fasta.fai
@@ -0,0 +1,2 @@
ref 45 5 45 46
ref2 40 57 40 41
21 changes: 10 additions & 11 deletions bio/gatk3/indelrealigner/wrapper.py
Expand Up @@ -10,38 +10,37 @@


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
java_opts = get_java_opts(snakemake)


bed = snakemake.input.get("bed", "")
if bed:
bed = "-L " + bed
bed = f"--intervals {bed}"


known = snakemake.input.get("known", "")
if known:
if isinstance(known, str):
known = "-known {}".format(known)
known = f"--knownAlleles {known}"
else:
known = list(map("-known {}".format, known))
known = list(map("----knownAlleles {}".format, known))


output_bai = snakemake.output.get("bai", None)
if output_bai is None:
extra += " --disable_bam_indexing"


log = snakemake.log_fmt_shell(stdout=True, stderr=True)


shell(
"gatk3 {java_opts} -T IndelRealigner"
" {extra}"
" -I {snakemake.input.bam}"
" -R {snakemake.input.ref}"
"gatk3 {java_opts}"
" --analysis_type IndelRealigner"
" --input_file {snakemake.input.bam}"
" --reference_sequence {snakemake.input.ref}"
" {known}"
" {bed}"
" --targetIntervals {snakemake.input.target_intervals}"
" -o {snakemake.output.bam}"
" {extra}"
" --out {snakemake.output.bam}"
" {log}"
)
1 change: 1 addition & 0 deletions bio/gatk3/printreads/environment.yaml
Expand Up @@ -4,4 +4,5 @@ channels:
- nodefaults
dependencies:
- gatk =3.8
- python >=3.10
- snakemake-wrapper-utils =0.5.0
3 changes: 0 additions & 3 deletions bio/gatk3/printreads/test/README.md

This file was deleted.

18 changes: 11 additions & 7 deletions bio/gatk3/printreads/test/Snakefile
@@ -1,16 +1,20 @@
rule printreads:
input:
bam="mapped/{sample}.bam",
bam="{sample}.bam",
bai="{sample}.bai",
# recal_data="{sample}.recal_data_table",
ref="genome.fasta",
recal_data="{sample}.recal_data_table"
fai="genome.fasta.fai",
dict="genome.dict",
output:
"alignment/{sample}.bqsr.bam"
bam="{sample}.bqsr.bam",
bai="{sample}.bqsr.bai",
log:
"logs/gatk/bqsr/{sample}..log"
"logs/gatk/bqsr/{sample}.log",
params:
extra="" # optional
extra="--defaultBaseQualities 20 --filter_reads_with_N_cigar", # optional
resources:
mem_mb = 1024
mem_mb=1024,
threads: 16
wrapper:
"bio/gatk3/printreads"
"master/bio/gatk3/printreads"
Binary file added bio/gatk3/printreads/test/a.bai
Binary file not shown.
Binary file added bio/gatk3/printreads/test/a.bam
Binary file not shown.

0 comments on commit 43e5a16

Please sign in to comment.