Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: gatk modelsegments wrapper #1321

Merged
merged 4 commits into from May 3, 2023
Merged

Conversation

Smeds
Copy link
Contributor

@Smeds Smeds commented May 2, 2023

Description

gatk ModelSegments wrapper

QC

  • I confirm that:

For all wrappers added by this PR,

  • there is a test case which covers any introduced changes,
  • input: and output: file paths in the resulting rule can be changed arbitrarily,
  • either the wrapper can only use a single core, or the example rule contains a threads: x statement with x being a reasonable default,
  • rule names in the test case are in snake_case and somehow tell what the rule is about or match the tools purpose or name (e.g., map_reads for a step that maps reads),
  • all environment.yaml specifications follow the respective best practices,
  • wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in input: or output:),
  • all fields of the example rules in the Snakefiles and their entries are explained via comments (input:/output:/params: etc.),
  • stderr and/or stdout are logged correctly (log:), depending on the wrapped tool,
  • temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function tempfile.gettempdir() points to (see here; this also means that using any Python tempfile default behavior works),
  • the meta.yaml contains a link to the documentation of the respective tool or command,
  • Snakefiles pass the linting (snakemake --lint),
  • Snakefiles are formatted with snakefmt,
  • Python wrapper scripts are formatted with black.
  • Conda environments use a minimal amount of channels, in recommended ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as conda-forge should have highest priority and defaults channels are usually not needed because most packages are in conda-forge nowadays).

@Smeds Smeds force-pushed the add-wrapper-gatk-modelsegements branch from 3210528 to 1a2c1f1 Compare May 2, 2023 14:09
Comment on lines 66 to 79
# Create prefix from listed output files
for output in snakemake.output:
output_file = os.path.basename(output)
for output_extensions in expected_output_file_endings:
if output_file.endswith(output_extensions):
output_prefix = output_file.replace(output_extensions, "")
break
if output_prefix:
break
if not output_prefix:
raise Exception(
"Unable to extract prefix from listed files, expecting file(s) ending "
f"with at least one of {expected_output_file_endings}"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this would do the same:

prefix = os.path.commonprefix(output)
output_prefix = os.path.basename(prefix)

It only works with, at least, 2 files; and does not check the extensions, but this would be done by the rule when it finishes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your solution is much nicer. But it could be that the user only wants one output. I could handle that by making a check for at least 2 files and if only one is listed throw an exception saying that prefix needs to be specified when only one file is request.

Copy link
Collaborator

@fgvieira fgvieira May 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was really just a suggestion to make the code more readable, so completely up to you! 😄

But looking into it again, maybe the files don't even need to have the same prefix. Since you are copying them from the temp folder at the end, you can use any prefix you want, no? You just need to have a dict with the correspondence between output names and extensions.
And it would make it compatible with the second item of PR best-practices (input: and output: file paths in the resulting rule can be changed arbitrarily).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the code doing the copy. The user can now:

  • have different prefix for each file, generated output will be matched with listed output
  • the wrapper will not assume that specif filed are generate, i.e gatk can update output from the function and this should not break the wrapper
  • prefix has been removed from params since the output list will set name of files

@Smeds Smeds force-pushed the add-wrapper-gatk-modelsegements branch from ce3e1dc to 6f2e848 Compare May 3, 2023 10:18
@fgvieira fgvieira merged commit dfecc26 into master May 3, 2023
6 checks passed
@fgvieira fgvieira deleted the add-wrapper-gatk-modelsegements branch May 3, 2023 10:53
tgroth97 pushed a commit to tgroth97/snakemake-wrappers that referenced this pull request May 3, 2023
### Description

gatk ModelSegments wrapper

### QC
<!-- Make sure that you can tick the boxes below. -->

* [ ] I confirm that:

For all wrappers added by this PR, 

* there is a test case which covers any introduced changes,
* `input:` and `output:` file paths in the resulting rule can be changed
arbitrarily,
* either the wrapper can only use a single core, or the example rule
contains a `threads: x` statement with `x` being a reasonable default,
* rule names in the test case are in
[snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell
what the rule is about or match the tools purpose or name (e.g.,
`map_reads` for a step that maps reads),
* all `environment.yaml` specifications follow [the respective best
practices](https://stackoverflow.com/a/64594513/2352071),
* wherever possible, command line arguments are inferred and set
automatically (e.g. based on file extensions in `input:` or `output:`),
* all fields of the example rules in the `Snakefile`s and their entries
are explained via comments (`input:`/`output:`/`params:` etc.),
* `stderr` and/or `stdout` are logged correctly (`log:`), depending on
the wrapped tool,
* temporary files are either written to a unique hidden folder in the
working directory, or (better) stored where the Python function
`tempfile.gettempdir()` points to (see
[here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir);
this also means that using any Python `tempfile` default behavior
works),
* the `meta.yaml` contains a link to the documentation of the respective
tool or command,
* `Snakefile`s pass the linting (`snakemake --lint`),
* `Snakefile`s are formatted with
[snakefmt](https://github.com/snakemake/snakefmt),
* Python wrapper scripts are formatted with
[black](https://black.readthedocs.io).
* Conda environments use a minimal amount of channels, in recommended
ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as
conda-forge should have highest priority and defaults channels are
usually not needed because most packages are in conda-forge nowadays).

---------

Co-authored-by: Filipe G. Vieira <1151762+fgvieira@users.noreply.github.com>
johanneskoester pushed a commit that referenced this pull request May 4, 2023
🤖 I have created a release \*beep\* \*boop\*
---
##
[1.29.0](https://www.github.com/snakemake/snakemake-wrappers/compare/v1.28.0...v1.29.0)
(2023-05-04)


### Features

* Add seqkit subseq
([#1318](https://www.github.com/snakemake/snakemake-wrappers/issues/1318))
([262d9bb](https://www.github.com/snakemake/snakemake-wrappers/commit/262d9bb50cf8d03edca00916bc29fe5f1009209a))
* Deeptools Alignement seive
([#1320](https://www.github.com/snakemake/snakemake-wrappers/issues/1320))
([b7cb7ab](https://www.github.com/snakemake/snakemake-wrappers/commit/b7cb7ab5b70c1eec3b89d02f0bc94a2294e6a9aa))
* fix threads and new IO options in Bowtie2
([#1324](https://www.github.com/snakemake/snakemake-wrappers/issues/1324))
([a9c7117](https://www.github.com/snakemake/snakemake-wrappers/commit/a9c711718533f18429db50fb052549c49aa6fcf7))
* gatk CallCopyRatioSegments
([#1323](https://www.github.com/snakemake/snakemake-wrappers/issues/1323))
([528c91a](https://www.github.com/snakemake/snakemake-wrappers/commit/528c91a5b87ffe0d34964b01478a9b5d520ed80f))
* gatk collectalleliccount wrapper
([#1316](https://www.github.com/snakemake/snakemake-wrappers/issues/1316))
([158c328](https://www.github.com/snakemake/snakemake-wrappers/commit/158c328d1b846fb45c90c946edbcf2a7a0da15de))
* gatk collectreadcount wrapper
([#1315](https://www.github.com/snakemake/snakemake-wrappers/issues/1315))
([ee55de4](https://www.github.com/snakemake/snakemake-wrappers/commit/ee55de4cc2a850775ef94d5394f9a220844d8a7b))
* gatk DenoisedReadCounts wrapper
([#1319](https://www.github.com/snakemake/snakemake-wrappers/issues/1319))
([0288ace](https://www.github.com/snakemake/snakemake-wrappers/commit/0288ace7e61fbf2adf79514eefa2ff8f3566041e))
* gatk modelsegments wrapper
([#1321](https://www.github.com/snakemake/snakemake-wrappers/issues/1321))
([dfecc26](https://www.github.com/snakemake/snakemake-wrappers/commit/dfecc26de3c649c4661ac26203d846a54f7dd0b0))
* Pyroe make-spliced+unspliced
([#1290](https://www.github.com/snakemake/snakemake-wrappers/issues/1290))
([96a2cbb](https://www.github.com/snakemake/snakemake-wrappers/commit/96a2cbbf667fb45676087c2002fa875a76982e6b))


### Bug Fixes

* Lofreq indelqual
([#1325](https://www.github.com/snakemake/snakemake-wrappers/issues/1325))
([dabecf0](https://www.github.com/snakemake/snakemake-wrappers/commit/dabecf065975b7e53102221a63ec11b2c812cdee))


### Performance Improvements

* Updated version of datavzrd to 2.19.1
([#1328](https://www.github.com/snakemake/snakemake-wrappers/issues/1328))
([fd89645](https://www.github.com/snakemake/snakemake-wrappers/commit/fd89645aa28773f872a3b2f54192d66129852d83))
---


This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
tdayris pushed a commit to tdayris/snakemake-wrappers that referenced this pull request May 11, 2023
### Description

gatk ModelSegments wrapper

### QC
<!-- Make sure that you can tick the boxes below. -->

* [ ] I confirm that:

For all wrappers added by this PR, 

* there is a test case which covers any introduced changes,
* `input:` and `output:` file paths in the resulting rule can be changed
arbitrarily,
* either the wrapper can only use a single core, or the example rule
contains a `threads: x` statement with `x` being a reasonable default,
* rule names in the test case are in
[snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell
what the rule is about or match the tools purpose or name (e.g.,
`map_reads` for a step that maps reads),
* all `environment.yaml` specifications follow [the respective best
practices](https://stackoverflow.com/a/64594513/2352071),
* wherever possible, command line arguments are inferred and set
automatically (e.g. based on file extensions in `input:` or `output:`),
* all fields of the example rules in the `Snakefile`s and their entries
are explained via comments (`input:`/`output:`/`params:` etc.),
* `stderr` and/or `stdout` are logged correctly (`log:`), depending on
the wrapped tool,
* temporary files are either written to a unique hidden folder in the
working directory, or (better) stored where the Python function
`tempfile.gettempdir()` points to (see
[here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir);
this also means that using any Python `tempfile` default behavior
works),
* the `meta.yaml` contains a link to the documentation of the respective
tool or command,
* `Snakefile`s pass the linting (`snakemake --lint`),
* `Snakefile`s are formatted with
[snakefmt](https://github.com/snakemake/snakefmt),
* Python wrapper scripts are formatted with
[black](https://black.readthedocs.io).
* Conda environments use a minimal amount of channels, in recommended
ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as
conda-forge should have highest priority and defaults channels are
usually not needed because most packages are in conda-forge nowadays).

---------

Co-authored-by: Filipe G. Vieira <1151762+fgvieira@users.noreply.github.com>
tdayris pushed a commit to tdayris/snakemake-wrappers that referenced this pull request May 11, 2023
🤖 I have created a release \*beep\* \*boop\*
---
##
[1.29.0](https://www.github.com/snakemake/snakemake-wrappers/compare/v1.28.0...v1.29.0)
(2023-05-04)


### Features

* Add seqkit subseq
([snakemake#1318](https://www.github.com/snakemake/snakemake-wrappers/issues/1318))
([262d9bb](https://www.github.com/snakemake/snakemake-wrappers/commit/262d9bb50cf8d03edca00916bc29fe5f1009209a))
* Deeptools Alignement seive
([snakemake#1320](https://www.github.com/snakemake/snakemake-wrappers/issues/1320))
([b7cb7ab](https://www.github.com/snakemake/snakemake-wrappers/commit/b7cb7ab5b70c1eec3b89d02f0bc94a2294e6a9aa))
* fix threads and new IO options in Bowtie2
([snakemake#1324](https://www.github.com/snakemake/snakemake-wrappers/issues/1324))
([a9c7117](https://www.github.com/snakemake/snakemake-wrappers/commit/a9c711718533f18429db50fb052549c49aa6fcf7))
* gatk CallCopyRatioSegments
([snakemake#1323](https://www.github.com/snakemake/snakemake-wrappers/issues/1323))
([528c91a](https://www.github.com/snakemake/snakemake-wrappers/commit/528c91a5b87ffe0d34964b01478a9b5d520ed80f))
* gatk collectalleliccount wrapper
([snakemake#1316](https://www.github.com/snakemake/snakemake-wrappers/issues/1316))
([158c328](https://www.github.com/snakemake/snakemake-wrappers/commit/158c328d1b846fb45c90c946edbcf2a7a0da15de))
* gatk collectreadcount wrapper
([snakemake#1315](https://www.github.com/snakemake/snakemake-wrappers/issues/1315))
([ee55de4](https://www.github.com/snakemake/snakemake-wrappers/commit/ee55de4cc2a850775ef94d5394f9a220844d8a7b))
* gatk DenoisedReadCounts wrapper
([snakemake#1319](https://www.github.com/snakemake/snakemake-wrappers/issues/1319))
([0288ace](https://www.github.com/snakemake/snakemake-wrappers/commit/0288ace7e61fbf2adf79514eefa2ff8f3566041e))
* gatk modelsegments wrapper
([snakemake#1321](https://www.github.com/snakemake/snakemake-wrappers/issues/1321))
([dfecc26](https://www.github.com/snakemake/snakemake-wrappers/commit/dfecc26de3c649c4661ac26203d846a54f7dd0b0))
* Pyroe make-spliced+unspliced
([snakemake#1290](https://www.github.com/snakemake/snakemake-wrappers/issues/1290))
([96a2cbb](https://www.github.com/snakemake/snakemake-wrappers/commit/96a2cbbf667fb45676087c2002fa875a76982e6b))


### Bug Fixes

* Lofreq indelqual
([snakemake#1325](https://www.github.com/snakemake/snakemake-wrappers/issues/1325))
([dabecf0](https://www.github.com/snakemake/snakemake-wrappers/commit/dabecf065975b7e53102221a63ec11b2c812cdee))


### Performance Improvements

* Updated version of datavzrd to 2.19.1
([snakemake#1328](https://www.github.com/snakemake/snakemake-wrappers/issues/1328))
([fd89645](https://www.github.com/snakemake/snakemake-wrappers/commit/fd89645aa28773f872a3b2f54192d66129852d83))
---


This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
tdayris pushed a commit to tdayris/snakemake-wrappers that referenced this pull request Jul 5, 2023
### Description

gatk ModelSegments wrapper

### QC
<!-- Make sure that you can tick the boxes below. -->

* [ ] I confirm that:

For all wrappers added by this PR, 

* there is a test case which covers any introduced changes,
* `input:` and `output:` file paths in the resulting rule can be changed
arbitrarily,
* either the wrapper can only use a single core, or the example rule
contains a `threads: x` statement with `x` being a reasonable default,
* rule names in the test case are in
[snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell
what the rule is about or match the tools purpose or name (e.g.,
`map_reads` for a step that maps reads),
* all `environment.yaml` specifications follow [the respective best
practices](https://stackoverflow.com/a/64594513/2352071),
* wherever possible, command line arguments are inferred and set
automatically (e.g. based on file extensions in `input:` or `output:`),
* all fields of the example rules in the `Snakefile`s and their entries
are explained via comments (`input:`/`output:`/`params:` etc.),
* `stderr` and/or `stdout` are logged correctly (`log:`), depending on
the wrapped tool,
* temporary files are either written to a unique hidden folder in the
working directory, or (better) stored where the Python function
`tempfile.gettempdir()` points to (see
[here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir);
this also means that using any Python `tempfile` default behavior
works),
* the `meta.yaml` contains a link to the documentation of the respective
tool or command,
* `Snakefile`s pass the linting (`snakemake --lint`),
* `Snakefile`s are formatted with
[snakefmt](https://github.com/snakemake/snakefmt),
* Python wrapper scripts are formatted with
[black](https://black.readthedocs.io).
* Conda environments use a minimal amount of channels, in recommended
ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as
conda-forge should have highest priority and defaults channels are
usually not needed because most packages are in conda-forge nowadays).

---------

Co-authored-by: Filipe G. Vieira <1151762+fgvieira@users.noreply.github.com>
tdayris pushed a commit to tdayris/snakemake-wrappers that referenced this pull request Jul 5, 2023
🤖 I have created a release \*beep\* \*boop\*
---
##
[1.29.0](https://www.github.com/snakemake/snakemake-wrappers/compare/v1.28.0...v1.29.0)
(2023-05-04)


### Features

* Add seqkit subseq
([snakemake#1318](https://www.github.com/snakemake/snakemake-wrappers/issues/1318))
([262d9bb](https://www.github.com/snakemake/snakemake-wrappers/commit/262d9bb50cf8d03edca00916bc29fe5f1009209a))
* Deeptools Alignement seive
([snakemake#1320](https://www.github.com/snakemake/snakemake-wrappers/issues/1320))
([b7cb7ab](https://www.github.com/snakemake/snakemake-wrappers/commit/b7cb7ab5b70c1eec3b89d02f0bc94a2294e6a9aa))
* fix threads and new IO options in Bowtie2
([snakemake#1324](https://www.github.com/snakemake/snakemake-wrappers/issues/1324))
([a9c7117](https://www.github.com/snakemake/snakemake-wrappers/commit/a9c711718533f18429db50fb052549c49aa6fcf7))
* gatk CallCopyRatioSegments
([snakemake#1323](https://www.github.com/snakemake/snakemake-wrappers/issues/1323))
([528c91a](https://www.github.com/snakemake/snakemake-wrappers/commit/528c91a5b87ffe0d34964b01478a9b5d520ed80f))
* gatk collectalleliccount wrapper
([snakemake#1316](https://www.github.com/snakemake/snakemake-wrappers/issues/1316))
([158c328](https://www.github.com/snakemake/snakemake-wrappers/commit/158c328d1b846fb45c90c946edbcf2a7a0da15de))
* gatk collectreadcount wrapper
([snakemake#1315](https://www.github.com/snakemake/snakemake-wrappers/issues/1315))
([ee55de4](https://www.github.com/snakemake/snakemake-wrappers/commit/ee55de4cc2a850775ef94d5394f9a220844d8a7b))
* gatk DenoisedReadCounts wrapper
([snakemake#1319](https://www.github.com/snakemake/snakemake-wrappers/issues/1319))
([0288ace](https://www.github.com/snakemake/snakemake-wrappers/commit/0288ace7e61fbf2adf79514eefa2ff8f3566041e))
* gatk modelsegments wrapper
([snakemake#1321](https://www.github.com/snakemake/snakemake-wrappers/issues/1321))
([dfecc26](https://www.github.com/snakemake/snakemake-wrappers/commit/dfecc26de3c649c4661ac26203d846a54f7dd0b0))
* Pyroe make-spliced+unspliced
([snakemake#1290](https://www.github.com/snakemake/snakemake-wrappers/issues/1290))
([96a2cbb](https://www.github.com/snakemake/snakemake-wrappers/commit/96a2cbbf667fb45676087c2002fa875a76982e6b))


### Bug Fixes

* Lofreq indelqual
([snakemake#1325](https://www.github.com/snakemake/snakemake-wrappers/issues/1325))
([dabecf0](https://www.github.com/snakemake/snakemake-wrappers/commit/dabecf065975b7e53102221a63ec11b2c812cdee))


### Performance Improvements

* Updated version of datavzrd to 2.19.1
([snakemake#1328](https://www.github.com/snakemake/snakemake-wrappers/issues/1328))
([fd89645](https://www.github.com/snakemake/snakemake-wrappers/commit/fd89645aa28773f872a3b2f54192d66129852d83))
---


This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants