Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Xsv (complete subcomand suite) (#1430)
<!-- Ensure that the PR title follows conventional commit style (<type>: <description>)--> <!-- Possible types are here: https://github.com/commitizen/conventional-commit-types/blob/master/index.json --> ### Description This PR adds [xsv](https://github.com/BurntSushi/xsv) select, to the list of available wrappers. ### QC <!-- Make sure that you can tick the boxes below. --> * [X] I confirm that: For all wrappers added by this PR, * there is a test case which covers any introduced changes, * `input:` and `output:` file paths in the resulting rule can be changed arbitrarily, * either the wrapper can only use a single core, or the example rule contains a `threads: x` statement with `x` being a reasonable default, * rule names in the test case are in [snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell what the rule is about or match the tools purpose or name (e.g., `map_reads` for a step that maps reads), * all `environment.yaml` specifications follow [the respective best practices](https://stackoverflow.com/a/64594513/2352071), * wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in `input:` or `output:`), * all fields of the example rules in the `Snakefile`s and their entries are explained via comments (`input:`/`output:`/`params:` etc.), * `stderr` and/or `stdout` are logged correctly (`log:`), depending on the wrapped tool, * temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function `tempfile.gettempdir()` points to (see [here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir); this also means that using any Python `tempfile` default behavior works), * the `meta.yaml` contains a link to the documentation of the respective tool or command, * `Snakefile`s pass the linting (`snakemake --lint`), * `Snakefile`s are formatted with [snakefmt](https://github.com/snakemake/snakefmt), * Python wrapper scripts are formatted with [black](https://black.readthedocs.io). * Conda environments use a minimal amount of channels, in recommended ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as conda-forge should have highest priority and defaults channels are usually not needed because most packages are in conda-forge nowadays). --------- Co-authored-by: tdayris <tdayris@gustaveroussy.fr> Co-authored-by: tdayris <thibault.dayris@gustaveroussy.fr> Co-authored-by: Johannes Köster <johannes.koester@uni-due.de> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: snakedeploy-bot[bot] <115615832+snakedeploy-bot[bot]@users.noreply.github.com> Co-authored-by: Felix Mölder <felix.moelder@uni-due.de> Co-authored-by: Christopher Schröder <christopher.schroeder@tu-dortmund.de>
- Loading branch information
1 parent
b915e37
commit da81a5a
Showing
8 changed files
with
475 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
channels: | ||
- conda-forge | ||
- nodefaults | ||
dependencies: | ||
- xsv=0.13.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
name: xsv | ||
url: https://github.com/BurntSushi/xsv | ||
description: > | ||
Perform various operations over CSV/TSV tables. | ||
authors: | ||
- Thibault Dayris | ||
input: | ||
- table: Path to CSV/TSV table. | ||
output: | ||
- Path the result file / directory | ||
params: | ||
- extra: Optional arguments for `xsv`. For TSV files, `--delimiter` is automatically set to a tabulation. | ||
- subcommand: xsv subcommand among `cat`, `count`, `fixlengths`, `flatten`, `fmt`, `frequency`, `headers`, `index`, `input`, `join`, `sample`, `search`, `select`, `slice`, `sort`, `split`, `stats`, or `table` | ||
notes: | | ||
Adding table(s) index(es) to the input file list makes many subcommands faster. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,348 @@ | ||
### Concatenation subcommand ### | ||
rule test_xsv_cat_rows: | ||
input: | ||
table=["table.csv", "right.csv"], | ||
output: | ||
"xsv_catrows.csv", | ||
threads: 1 | ||
log: | ||
"xsv/catrow.log", | ||
params: | ||
subcommand="cat rows", | ||
extra="", | ||
wrapper: | ||
"master/bio/xsv" | ||
|
||
|
||
rule test_xsv_cat_cols: | ||
input: | ||
table=["table.csv", "right.csv"], | ||
output: | ||
"xsv_catcols.csv", | ||
threads: 1 | ||
log: | ||
"xsv/catcol.log", | ||
params: | ||
subcommand="cat columns", | ||
extra="", | ||
wrapper: | ||
"master/bio/xsv" | ||
|
||
|
||
### Count subcommand ### | ||
rule test_xsv_count: | ||
input: | ||
table="table.csv", | ||
output: | ||
"xsv_count.csv", | ||
threads: 1 | ||
log: | ||
"xsv/count.log", | ||
params: | ||
subcommand="count", | ||
extra="", | ||
wrapper: | ||
"master/bio/xsv" | ||
|
||
|
||
rule test_xsv_count_tsv_input: | ||
input: | ||
table="table.tsv", | ||
output: | ||
"xsv_count.tsv_as_input.csv", | ||
threads: 1 | ||
log: | ||
"xsv/count.log", | ||
params: | ||
subcommand="count", | ||
extra="", | ||
wrapper: | ||
"master/bio/xsv" | ||
|
||
|
||
### Fix lengths subcommand ### | ||
rule test_xsv_fixlength: | ||
input: | ||
table="table.csv", | ||
output: | ||
"xsv_fixlength.csv", | ||
threads: 1 | ||
log: | ||
"xsv/fixlength.log", | ||
params: | ||
subcommand="fixlengths", | ||
extra="--length 20", | ||
wrapper: | ||
"master/bio/xsv" | ||
|
||
|
||
### Flatten subcommand ### | ||
rule test_xsv_flatten: | ||
input: | ||
table="table.csv", | ||
output: | ||
"xsv_flatten.csv", | ||
threads: 1 | ||
log: | ||
"xsv/flatten.log", | ||
params: | ||
subcommand="flatten", | ||
extra="", | ||
wrapper: | ||
"master/bio/xsv" | ||
|
||
|
||
### Format subcommand ### | ||
rule test_xsv_fmt: | ||
input: | ||
table="table.csv", | ||
output: | ||
"xsv_fmt.tsv", | ||
threads: 1 | ||
log: | ||
"xsv/fmt.log", | ||
params: | ||
subcommand="fmt", | ||
extra="", | ||
wrapper: | ||
"master/bio/xsv" | ||
|
||
|
||
### Frequency subcommand ### | ||
rule test_xsv_frequency: | ||
input: | ||
table="table.csv", | ||
output: | ||
"xsv_frequency.csv", | ||
threads: 1 | ||
log: | ||
"xsv/frequency.log", | ||
params: | ||
subcommand="frequency", | ||
extra="", | ||
wrapper: | ||
"master/bio/xsv" | ||
|
||
|
||
### Headers subcommand ### | ||
rule test_xsv_headers: | ||
input: | ||
table="table.csv", | ||
output: | ||
"xsv_headers.csv", | ||
threads: 1 | ||
log: | ||
"xsv/headers.log", | ||
params: | ||
subcommand="headers", | ||
extra="", | ||
wrapper: | ||
"master/bio/xsv" | ||
|
||
|
||
rule test_xsv_headers_list: | ||
input: | ||
table=["table.csv", "right.csv"], | ||
output: | ||
"xsv_headers_all.csv", | ||
threads: 1 | ||
log: | ||
"xsv/headers_all.log", | ||
params: | ||
subcommand="headers", | ||
extra="--intersect", | ||
wrapper: | ||
"master/bio/xsv" | ||
|
||
|
||
### Index subcommand ### | ||
rule test_xsv_index: | ||
input: | ||
table="table.csv", | ||
output: | ||
"table.csv.idx", | ||
threads: 1 | ||
log: | ||
"xsv/index.log", | ||
params: | ||
subcommand="index", | ||
extra="", | ||
wrapper: | ||
"master/bio/xsv" | ||
|
||
|
||
### Input subcommand ### | ||
rule test_xsv_input: | ||
input: | ||
table="table.csv", | ||
output: | ||
"xsv_input.csv", | ||
threads: 1 | ||
log: | ||
"xsv/input.log", | ||
params: | ||
subcommand="input", | ||
extra="", | ||
wrapper: | ||
"master/bio/xsv" | ||
|
||
|
||
### Join subcommand ### | ||
rule test_xsv_join: | ||
input: | ||
table=["table.csv", "right.csv"], | ||
output: | ||
"xsv_join.csv", | ||
threads: 1 | ||
log: | ||
"xsv/join.log", | ||
params: | ||
subcommand="join", | ||
col1="gene_id", | ||
col2="gene_id", | ||
extra="", | ||
wrapper: | ||
"master/bio/xsv" | ||
|
||
|
||
### Sample subcommand ### | ||
rule test_xsv_sample: | ||
input: | ||
table="table.csv", | ||
output: | ||
"xsv_sample.csv", | ||
threads: 1 | ||
log: | ||
"xsv/sample.log", | ||
params: | ||
subcommand="sample", | ||
extra="1", | ||
wrapper: | ||
"master/bio/xsv" | ||
|
||
|
||
### Search subcommand ### | ||
rule test_xsv_search: | ||
input: | ||
table="table.csv", | ||
output: | ||
"xsv_search.csv", | ||
threads: 1 | ||
log: | ||
"xsv/search.log", | ||
params: | ||
subcommand="search", | ||
extra="--select gene_id ENSG[0-9]+", | ||
wrapper: | ||
"master/bio/xsv" | ||
|
||
|
||
### Select subcommand ### | ||
rule test_xsv_select: | ||
input: | ||
table="table.csv", | ||
output: | ||
"xsv_select.csv", | ||
threads: 1 | ||
log: | ||
"xsv/select.log", | ||
params: | ||
subcommand="select", | ||
extra="3-", | ||
wrapper: | ||
"master/bio/xsv" | ||
|
||
|
||
### Slice subcommand ### | ||
rule test_xsv_slice: | ||
input: | ||
table="table.csv", | ||
output: | ||
"xsv_slice.csv", | ||
threads: 1 | ||
log: | ||
"xsv/slice.log", | ||
params: | ||
subcommand="slice", | ||
extra="-i 2", | ||
wrapper: | ||
"master/bio/xsv" | ||
|
||
|
||
### Sort subcommand ### | ||
rule test_xsv_sort: | ||
input: | ||
table="table.csv", | ||
output: | ||
"xsv_sort.csv", | ||
threads: 1 | ||
log: | ||
"xsv/sort.log", | ||
params: | ||
subcommand="sort", | ||
extra="", | ||
wrapper: | ||
"master/bio/xsv" | ||
|
||
|
||
### Split subcommand ### | ||
rule test_xsv_split: | ||
input: | ||
table="table.csv", | ||
output: | ||
directory("xsv_split"), | ||
threads: 1 | ||
log: | ||
"xsv/split.log", | ||
params: | ||
subcommand="split", | ||
extra="-s 2", | ||
wrapper: | ||
"master/bio/xsv" | ||
|
||
|
||
rule test_xsv_split_list: | ||
input: | ||
table="table.csv", | ||
output: | ||
expand("xsv_split/{nb}.csv", nb=["0", "1"]), | ||
threads: 1 | ||
log: | ||
"xsv/split.log", | ||
params: | ||
subcommand="split", | ||
extra="-s 1", | ||
wrapper: | ||
"master/bio/xsv" | ||
|
||
|
||
### Stat subcommand ### | ||
rule test_xsv_stats: | ||
input: | ||
table="table.csv", | ||
output: | ||
"xsv_stats.txt", | ||
threads: 1 | ||
log: | ||
"xsv/stats.log", | ||
params: | ||
subcommand="stats", | ||
extra="", | ||
wrapper: | ||
"master/bio/xsv" | ||
|
||
|
||
### Table subcommand ### | ||
rule test_xsv_table: | ||
input: | ||
table="right.csv", | ||
output: | ||
"xsv_table.txt", | ||
threads: 1 | ||
log: | ||
"xsv/table.log", | ||
params: | ||
subcommand="table", | ||
extra="", | ||
wrapper: | ||
"master/bio/xsv" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
gene_id,s4,s5,s6 | ||
ENSG03,24.5,15,85 | ||
ENSG02,12,157,0.2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
gene_id,s1,s2,s3 | ||
ENSG01,14.5,15,75 | ||
ENSG02,12,57,0.2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
gene_id s1 s2 s3 | ||
ENSG01 14.5 15 75 | ||
ENSG02 12 57 0.2 |
Oops, something went wrong.