Skip to content

Commit

Permalink
feat: add entrez/efetch wrapper (#2411)
Browse files Browse the repository at this point in the history
<!-- Ensure that the PR title follows conventional commit style (<type>:
<description>)-->
<!-- Possible types are here:
https://github.com/commitizen/conventional-commit-types/blob/master/index.json
-->

<!-- Add a description of your PR here-->

### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] I confirm that:

For all wrappers added by this PR, 

* there is a test case which covers any introduced changes,
* `input:` and `output:` file paths in the resulting rule can be changed
arbitrarily,
* either the wrapper can only use a single core, or the example rule
contains a `threads: x` statement with `x` being a reasonable default,
* rule names in the test case are in
[snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell
what the rule is about or match the tools purpose or name (e.g.,
`map_reads` for a step that maps reads),
* all `environment.yaml` specifications follow [the respective best
practices](https://stackoverflow.com/a/64594513/2352071),
* the `environment.yaml` pinning has been updated by running
`snakedeploy pin-conda-envs environment.yaml` on a linux machine,
* wherever possible, command line arguments are inferred and set
automatically (e.g. based on file extensions in `input:` or `output:`),
* all fields of the example rules in the `Snakefile`s and their entries
are explained via comments (`input:`/`output:`/`params:` etc.),
* `stderr` and/or `stdout` are logged correctly (`log:`), depending on
the wrapped tool,
* temporary files are either written to a unique hidden folder in the
working directory, or (better) stored where the Python function
`tempfile.gettempdir()` points to (see
[here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir);
this also means that using any Python `tempfile` default behavior
works),
* the `meta.yaml` contains a link to the documentation of the respective
tool or command,
* `Snakefile`s pass the linting (`snakemake --lint`),
* `Snakefile`s are formatted with
[snakefmt](https://github.com/snakemake/snakefmt),
* Python wrapper scripts are formatted with
[black](https://black.readthedocs.io).
* Conda environments use a minimal amount of channels, in recommended
ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as
conda-forge should have highest priority and defaults channels are
usually not needed because most packages are in conda-forge nowadays).
  • Loading branch information
johanneskoester committed Dec 14, 2023
1 parent c43af99 commit 9f0ad09
Show file tree
Hide file tree
Showing 7 changed files with 76 additions and 2 deletions.
3 changes: 1 addition & 2 deletions .github/pull_request_template.md
@@ -1,8 +1,6 @@
<!-- Ensure that the PR title follows conventional commit style (<type>: <description>)-->
<!-- Possible types are here: https://github.com/commitizen/conventional-commit-types/blob/master/index.json -->

### Description

<!-- Add a description of your PR here-->

### QC
Expand All @@ -17,6 +15,7 @@ For all wrappers added by this PR,
* either the wrapper can only use a single core, or the example rule contains a `threads: x` statement with `x` being a reasonable default,
* rule names in the test case are in [snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell what the rule is about or match the tools purpose or name (e.g., `map_reads` for a step that maps reads),
* all `environment.yaml` specifications follow [the respective best practices](https://stackoverflow.com/a/64594513/2352071),
* the `environment.yaml` pinning has been updated by running `snakedeploy pin-conda-envs environment.yaml` on a linux machine,
* wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in `input:` or `output:`),
* all fields of the example rules in the `Snakefile`s and their entries are explained via comments (`input:`/`output:`/`params:` etc.),
* `stderr` and/or `stdout` are logged correctly (`log:`), depending on the wrapped tool,
Expand Down
17 changes: 17 additions & 0 deletions bio/entrez/efetch/environment.linux-64.pin.txt
@@ -0,0 +1,17 @@
# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
@EXPLICIT
https://conda.anaconda.org/conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2#d7c89558ba9fa0495403155b64376d81
https://conda.anaconda.org/conda-forge/linux-64/ca-certificates-2023.11.17-hbcca054_0.conda#01ffc8d36f9eba0ce0b3c1955fa780ee
https://conda.anaconda.org/conda-forge/linux-64/libgomp-13.2.0-h807b86a_3.conda#7124cbb46b13d395bdde68f2d215c989
https://conda.anaconda.org/conda-forge/linux-64/_openmp_mutex-4.5-2_gnu.tar.bz2#73aaf86a425cc6e73fcf236a5a46396d
https://conda.anaconda.org/conda-forge/linux-64/libgcc-ng-13.2.0-h807b86a_3.conda#23fdf1fef05baeb7eadc2aed5fb0011f
https://conda.anaconda.org/conda-forge/linux-64/gettext-0.21.1-h27087fc_0.tar.bz2#14947d8770185e5153fdd04d4673ed37
https://conda.anaconda.org/conda-forge/linux-64/libunistring-0.9.10-h7f98852_0.tar.bz2#7245a044b4a1980ed83196176b78b73a
https://conda.anaconda.org/conda-forge/linux-64/libzlib-1.2.13-hd590300_5.conda#f36c115f1ee199da648e0597ec2047ad
https://conda.anaconda.org/conda-forge/linux-64/openssl-3.2.0-hd590300_1.conda#603827b39ea2b835268adb8c821b8570
https://conda.anaconda.org/conda-forge/linux-64/libidn2-2.3.4-h166bdaf_0.tar.bz2#7440fbafd870b8bab68f83a064875d34
https://conda.anaconda.org/conda-forge/linux-64/zlib-1.2.13-hd590300_5.conda#68c34ec6149623be41a1933ab996a209
https://conda.anaconda.org/conda-forge/linux-64/wget-1.20.3-ha35d2d1_1.tar.bz2#c990e108f39e1b43adf61e984360c9a1
https://conda.anaconda.org/bioconda/linux-64/entrez-direct-16.2-he881be0_1.tar.bz2#ff30142050ba583481215a6e1b3a5de0
6 changes: 6 additions & 0 deletions bio/entrez/efetch/environment.yaml
@@ -0,0 +1,6 @@
channels:
- conda-forge
- bioconda
- nodefaults
dependencies:
- entrez-direct =16.2
7 changes: 7 additions & 0 deletions bio/entrez/efetch/meta.yaml
@@ -0,0 +1,7 @@
name: efetch
description: Obtain data from NCBI and Genbank using Entrez efetch
url: https://www.ncbi.nlm.nih.gov/books/NBK179288/
authors:
- Johannes Köster
output:
- Any format support by efetch
13 changes: 13 additions & 0 deletions bio/entrez/efetch/test/Snakefile
@@ -0,0 +1,13 @@
rule get_fasta:
output:
"test.fasta",
log:
"logs/get_fasta.log",
params:
id="KY785484",
db="nuccore",
format="fasta",
# optional mode
mode=None,
wrapper:
"master/bio/entrez/efetch"
24 changes: 24 additions & 0 deletions bio/entrez/efetch/wrapper.py
@@ -0,0 +1,24 @@
import subprocess as sp
import sys

if snakemake.log:
sys.stderr = open(snakemake.log[0], "w")

cmd = ["efetch"]


def add_param(param, required=False):
if snakemake.params.get(param):
cmd.extend(["-" + param, snakemake.params[param]])
elif required:
raise ValueError("Missing required parameter: " + param)
else:
return []


add_param("id", required=True)
for param in ["db", "format", "mode"]:
add_param(param)

with open(snakemake.output[0], "w") as out:
sp.run(cmd, stderr=sp.STDOUT, stdout=out)
8 changes: 8 additions & 0 deletions test.py
Expand Up @@ -5301,6 +5301,14 @@ def test_ucsc_twobittofa():
)


@skip_if_not_modified
def test_entrez_efetch():
run(
"bio/entrez/efetch",
["snakemake", "--cores", "1", "--use-conda", "-F"],
)


@skip_if_not_modified
def test_ensembl_sequence():
run(
Expand Down

0 comments on commit 9f0ad09

Please sign in to comment.