Skip to content

Commit

Permalink
feat: Xsv (complete subcomand suite) (#1430)
Browse files Browse the repository at this point in the history
<!-- Ensure that the PR title follows conventional commit style (<type>:
<description>)-->
<!-- Possible types are here:
https://github.com/commitizen/conventional-commit-types/blob/master/index.json
-->

### Description

This PR adds [xsv](https://github.com/BurntSushi/xsv) select, to the
list of available wrappers.

### QC
<!-- Make sure that you can tick the boxes below. -->

* [X] I confirm that:

For all wrappers added by this PR, 

* there is a test case which covers any introduced changes,
* `input:` and `output:` file paths in the resulting rule can be changed
arbitrarily,
* either the wrapper can only use a single core, or the example rule
contains a `threads: x` statement with `x` being a reasonable default,
* rule names in the test case are in
[snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell
what the rule is about or match the tools purpose or name (e.g.,
`map_reads` for a step that maps reads),
* all `environment.yaml` specifications follow [the respective best
practices](https://stackoverflow.com/a/64594513/2352071),
* wherever possible, command line arguments are inferred and set
automatically (e.g. based on file extensions in `input:` or `output:`),
* all fields of the example rules in the `Snakefile`s and their entries
are explained via comments (`input:`/`output:`/`params:` etc.),
* `stderr` and/or `stdout` are logged correctly (`log:`), depending on
the wrapped tool,
* temporary files are either written to a unique hidden folder in the
working directory, or (better) stored where the Python function
`tempfile.gettempdir()` points to (see
[here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir);
this also means that using any Python `tempfile` default behavior
works),
* the `meta.yaml` contains a link to the documentation of the respective
tool or command,
* `Snakefile`s pass the linting (`snakemake --lint`),
* `Snakefile`s are formatted with
[snakefmt](https://github.com/snakemake/snakefmt),
* Python wrapper scripts are formatted with
[black](https://black.readthedocs.io).
* Conda environments use a minimal amount of channels, in recommended
ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as
conda-forge should have highest priority and defaults channels are
usually not needed because most packages are in conda-forge nowadays).

---------

Co-authored-by: tdayris <tdayris@gustaveroussy.fr>
Co-authored-by: tdayris <thibault.dayris@gustaveroussy.fr>
Co-authored-by: Johannes Köster <johannes.koester@uni-due.de>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: snakedeploy-bot[bot] <115615832+snakedeploy-bot[bot]@users.noreply.github.com>
Co-authored-by: Felix Mölder <felix.moelder@uni-due.de>
Co-authored-by: Christopher Schröder <christopher.schroeder@tu-dortmund.de>
  • Loading branch information
8 people committed Jun 29, 2023
1 parent b915e37 commit da81a5a
Show file tree
Hide file tree
Showing 8 changed files with 475 additions and 0 deletions.
5 changes: 5 additions & 0 deletions bio/xsv/environment.yaml
@@ -0,0 +1,5 @@
channels:
- conda-forge
- nodefaults
dependencies:
- xsv=0.13.0
15 changes: 15 additions & 0 deletions bio/xsv/meta.yaml
@@ -0,0 +1,15 @@
name: xsv
url: https://github.com/BurntSushi/xsv
description: >
Perform various operations over CSV/TSV tables.
authors:
- Thibault Dayris
input:
- table: Path to CSV/TSV table.
output:
- Path the result file / directory
params:
- extra: Optional arguments for `xsv`. For TSV files, `--delimiter` is automatically set to a tabulation.
- subcommand: xsv subcommand among `cat`, `count`, `fixlengths`, `flatten`, `fmt`, `frequency`, `headers`, `index`, `input`, `join`, `sample`, `search`, `select`, `slice`, `sort`, `split`, `stats`, or `table`
notes: |
Adding table(s) index(es) to the input file list makes many subcommands faster.
348 changes: 348 additions & 0 deletions bio/xsv/test/Snakefile
@@ -0,0 +1,348 @@
### Concatenation subcommand ###
rule test_xsv_cat_rows:
input:
table=["table.csv", "right.csv"],
output:
"xsv_catrows.csv",
threads: 1
log:
"xsv/catrow.log",
params:
subcommand="cat rows",
extra="",
wrapper:
"master/bio/xsv"


rule test_xsv_cat_cols:
input:
table=["table.csv", "right.csv"],
output:
"xsv_catcols.csv",
threads: 1
log:
"xsv/catcol.log",
params:
subcommand="cat columns",
extra="",
wrapper:
"master/bio/xsv"


### Count subcommand ###
rule test_xsv_count:
input:
table="table.csv",
output:
"xsv_count.csv",
threads: 1
log:
"xsv/count.log",
params:
subcommand="count",
extra="",
wrapper:
"master/bio/xsv"


rule test_xsv_count_tsv_input:
input:
table="table.tsv",
output:
"xsv_count.tsv_as_input.csv",
threads: 1
log:
"xsv/count.log",
params:
subcommand="count",
extra="",
wrapper:
"master/bio/xsv"


### Fix lengths subcommand ###
rule test_xsv_fixlength:
input:
table="table.csv",
output:
"xsv_fixlength.csv",
threads: 1
log:
"xsv/fixlength.log",
params:
subcommand="fixlengths",
extra="--length 20",
wrapper:
"master/bio/xsv"


### Flatten subcommand ###
rule test_xsv_flatten:
input:
table="table.csv",
output:
"xsv_flatten.csv",
threads: 1
log:
"xsv/flatten.log",
params:
subcommand="flatten",
extra="",
wrapper:
"master/bio/xsv"


### Format subcommand ###
rule test_xsv_fmt:
input:
table="table.csv",
output:
"xsv_fmt.tsv",
threads: 1
log:
"xsv/fmt.log",
params:
subcommand="fmt",
extra="",
wrapper:
"master/bio/xsv"


### Frequency subcommand ###
rule test_xsv_frequency:
input:
table="table.csv",
output:
"xsv_frequency.csv",
threads: 1
log:
"xsv/frequency.log",
params:
subcommand="frequency",
extra="",
wrapper:
"master/bio/xsv"


### Headers subcommand ###
rule test_xsv_headers:
input:
table="table.csv",
output:
"xsv_headers.csv",
threads: 1
log:
"xsv/headers.log",
params:
subcommand="headers",
extra="",
wrapper:
"master/bio/xsv"


rule test_xsv_headers_list:
input:
table=["table.csv", "right.csv"],
output:
"xsv_headers_all.csv",
threads: 1
log:
"xsv/headers_all.log",
params:
subcommand="headers",
extra="--intersect",
wrapper:
"master/bio/xsv"


### Index subcommand ###
rule test_xsv_index:
input:
table="table.csv",
output:
"table.csv.idx",
threads: 1
log:
"xsv/index.log",
params:
subcommand="index",
extra="",
wrapper:
"master/bio/xsv"


### Input subcommand ###
rule test_xsv_input:
input:
table="table.csv",
output:
"xsv_input.csv",
threads: 1
log:
"xsv/input.log",
params:
subcommand="input",
extra="",
wrapper:
"master/bio/xsv"


### Join subcommand ###
rule test_xsv_join:
input:
table=["table.csv", "right.csv"],
output:
"xsv_join.csv",
threads: 1
log:
"xsv/join.log",
params:
subcommand="join",
col1="gene_id",
col2="gene_id",
extra="",
wrapper:
"master/bio/xsv"


### Sample subcommand ###
rule test_xsv_sample:
input:
table="table.csv",
output:
"xsv_sample.csv",
threads: 1
log:
"xsv/sample.log",
params:
subcommand="sample",
extra="1",
wrapper:
"master/bio/xsv"


### Search subcommand ###
rule test_xsv_search:
input:
table="table.csv",
output:
"xsv_search.csv",
threads: 1
log:
"xsv/search.log",
params:
subcommand="search",
extra="--select gene_id ENSG[0-9]+",
wrapper:
"master/bio/xsv"


### Select subcommand ###
rule test_xsv_select:
input:
table="table.csv",
output:
"xsv_select.csv",
threads: 1
log:
"xsv/select.log",
params:
subcommand="select",
extra="3-",
wrapper:
"master/bio/xsv"


### Slice subcommand ###
rule test_xsv_slice:
input:
table="table.csv",
output:
"xsv_slice.csv",
threads: 1
log:
"xsv/slice.log",
params:
subcommand="slice",
extra="-i 2",
wrapper:
"master/bio/xsv"


### Sort subcommand ###
rule test_xsv_sort:
input:
table="table.csv",
output:
"xsv_sort.csv",
threads: 1
log:
"xsv/sort.log",
params:
subcommand="sort",
extra="",
wrapper:
"master/bio/xsv"


### Split subcommand ###
rule test_xsv_split:
input:
table="table.csv",
output:
directory("xsv_split"),
threads: 1
log:
"xsv/split.log",
params:
subcommand="split",
extra="-s 2",
wrapper:
"master/bio/xsv"


rule test_xsv_split_list:
input:
table="table.csv",
output:
expand("xsv_split/{nb}.csv", nb=["0", "1"]),
threads: 1
log:
"xsv/split.log",
params:
subcommand="split",
extra="-s 1",
wrapper:
"master/bio/xsv"


### Stat subcommand ###
rule test_xsv_stats:
input:
table="table.csv",
output:
"xsv_stats.txt",
threads: 1
log:
"xsv/stats.log",
params:
subcommand="stats",
extra="",
wrapper:
"master/bio/xsv"


### Table subcommand ###
rule test_xsv_table:
input:
table="right.csv",
output:
"xsv_table.txt",
threads: 1
log:
"xsv/table.log",
params:
subcommand="table",
extra="",
wrapper:
"master/bio/xsv"
3 changes: 3 additions & 0 deletions bio/xsv/test/right.csv
@@ -0,0 +1,3 @@
gene_id,s4,s5,s6
ENSG03,24.5,15,85
ENSG02,12,157,0.2
3 changes: 3 additions & 0 deletions bio/xsv/test/table.csv
@@ -0,0 +1,3 @@
gene_id,s1,s2,s3
ENSG01,14.5,15,75
ENSG02,12,57,0.2
3 changes: 3 additions & 0 deletions bio/xsv/test/table.tsv
@@ -0,0 +1,3 @@
gene_id s1 s2 s3
ENSG01 14.5 15 75
ENSG02 12 57 0.2

0 comments on commit da81a5a

Please sign in to comment.