Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for rust scripts (enabling directly integrated ad-hoc robust high performance scripting) #1053

Merged
merged 42 commits into from Aug 12, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
a803c6e
add support for rust scripts
tedil Jun 14, 2021
7df6194
Merge branch 'main' into rust-script
johanneskoester Jun 16, 2021
9e3c997
add rust environment yaml
tedil Jun 16, 2021
d003cd0
Merge branch 'rust-script' of github.com:snakemake/snakemake into rus…
tedil Jun 16, 2021
c69faf8
add missing files
tedil Jun 17, 2021
a5d7604
Merge branch 'main' into rust-script
tedil Jun 17, 2021
1da1e45
some basic docs
tedil Jun 17, 2021
793613b
Merge branch 'main' into rust-script
tedil Jun 22, 2021
9da571c
clarify default dependencies
mbhall88 Jun 24, 2021
6e31df0
add functionality to handle cargo manifest
mbhall88 Jun 25, 2021
cc3100e
Merge branch 'rust-script' of github.com:snakemake/snakemake into rus…
mbhall88 Jun 25, 2021
8591d80
remove redundant continue
mbhall88 Jun 25, 2021
5f83889
add some more rust script docs and restructure scripts docs
mbhall88 Jun 25, 2021
ecd8fe6
Merge branch 'main' into rust-script
tedil Jul 5, 2021
d9a1fac
use NamedList type instead of HashMap
tedil Jul 5, 2021
7c496ca
merge
tedil Jul 5, 2021
697960e
remove additional '--features', add indexmap/serde dependency+feature
tedil Jul 5, 2021
ceaf55c
update test-manifest.rs to use namedlist API aswell
tedil Jul 5, 2021
bf924f4
Merge branch 'main' into rust-script
tedil Jul 5, 2021
0f89579
add outer line doc testing and pin rust-script version
mbhall88 Jul 10, 2021
91533ed
Merge branch 'main' into rust-script
tedil Jul 11, 2021
31a281f
small fixes for rust outer doc test
mbhall88 Jul 13, 2021
078474c
format shell log string for rust-script
mbhall88 Jul 13, 2021
7a8ebf2
fmt
mbhall88 Jul 13, 2021
0b1a756
add missing test file
mbhall88 Jul 13, 2021
8da91f9
add missing rust script
mbhall88 Jul 13, 2021
4e90eca
replace serde-pickle with serde_json + json_typegen
tedil Jul 20, 2021
119af9d
fmt
tedil Jul 20, 2021
7e66fa5
only iter over positional items
tedil Jul 20, 2021
b7cbda3
fmt
tedil Jul 20, 2021
8a3cb97
add code to modify PATH, add functions for redirecting stdout, stderr…
tedil Jul 21, 2021
2da1fed
use fully qualified names instead of use statements
tedil Jul 21, 2021
4c8e031
update docs
tedil Jul 21, 2021
8706e94
remove print and todo
tedil Jul 23, 2021
c0fa475
use ordered list instead
tedil Jul 23, 2021
e25a7c3
remove example TODO
tedil Jul 23, 2021
dfff120
move comment about R snakemake@source() function to the R section
tedil Jul 23, 2021
57e16fb
update src comments
tedil Jul 23, 2021
513e2d6
Merge branch 'main' into rust-script
tedil Jul 26, 2021
fca4aaf
make log impl_iter and dont redirect rust-script stream
mbhall88 Jul 27, 2021
306ea86
minor additions to the docs
mbhall88 Aug 6, 2021
aa73982
Merge branch 'main' into rust-script
tedil Aug 11, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
182 changes: 151 additions & 31 deletions docs/snakefiles/rules.rst
Expand Up @@ -581,6 +581,9 @@ External scripts

A rule can also point to an external script instead of a shell command or inline Python code, e.g.

Python
~~~~~~

.. code-block:: python

rule NAME:
Expand All @@ -601,29 +604,24 @@ The script path is always relative to the Snakefile containing the directive (in
It is recommended to put all scripts into a subfolder ``scripts`` as above.
Inside the script, you have access to an object ``snakemake`` that provides access to the same objects that are available in the ``run`` and ``shell`` directives (input, output, params, wildcards, log, threads, resources, config), e.g. you can use ``snakemake.input[0]`` to access the first input file of above rule.

Apart from Python scripts, this mechanism also allows you to integrate R_ and R Markdown_ scripts with Snakemake, e.g.

.. _R: https://www.r-project.org
.. _Markdown: https://rmarkdown.rstudio.com
An example external Python script could look like this:

.. code-block:: python

rule NAME:
input:
"path/to/inputfile",
"path/to/other/inputfile"
output:
"path/to/outputfile",
"path/to/another/outputfile"
script:
"scripts/script.R"
def do_something(data_path, out_path, threads, myparam):
# python code

In the R script, an S4 object named ``snakemake`` analogous to the Python case above is available and allows access to input and output files and other parameters. Here the syntax follows that of S4 classes with attributes that are R lists, e.g. we can access the first input file with ``snakemake@input[[1]]`` (note that the first file does not have index ``0`` here, because R starts counting from ``1``). Named input and output files can be accessed in the same way, by just providing the name instead of an index, e.g. ``snakemake@input[["myfile"]]``.
do_something(snakemake.input[0], snakemake.output[0], snakemake.threads, snakemake.config["myparam"])

You can use the Python debugger from within the script if you invoke Snakemake with ``--debug``.

Alternatively, it is possible to integrate Julia_ scripts, e.g.
R and R Markdown
~~~~~~~~~~~~~~~~

.. _Julia: https://julialang.org
Apart from Python scripts, this mechanism also allows you to integrate R_ and R Markdown_ scripts with Snakemake, e.g.

.. _R: https://www.r-project.org
.. _Markdown: https://rmarkdown.rstudio.com

.. code-block:: python

Expand All @@ -635,23 +633,11 @@ Alternatively, it is possible to integrate Julia_ scripts, e.g.
"path/to/outputfile",
"path/to/another/outputfile"
script:
"path/to/script.jl"

In the Julia_ script, a ``snakemake`` object is available, which can be accessed similar to the Python case (see above), with the only difference that you have to index from 1 instead of 0.

For technical reasons, scripts are executed in ``.snakemake/scripts``. The original script directory is available as ``scriptdir`` in the ``snakemake`` object. A convenience method, ``snakemake@source()``, acts as a wrapper for the normal R ``source()`` function, and can be used to source files relative to the original script directory.

An example external Python script could look like this:

.. code-block:: python

def do_something(data_path, out_path, threads, myparam):
# python code
"scripts/script.R"

do_something(snakemake.input[0], snakemake.output[0], snakemake.threads, snakemake.config["myparam"])
In the R script, an S4 object named ``snakemake`` analogous to the Python case above is available and allows access to input and output files and other parameters. Here the syntax follows that of S4 classes with attributes that are R lists, e.g. we can access the first input file with ``snakemake@input[[1]]`` (note that the first file does not have index ``0`` here, because R starts counting from ``1``). Named input and output files can be accessed in the same way, by just providing the name instead of an index, e.g. ``snakemake@input[["myfile"]]``.

You can use the Python debugger from within the script if you invoke Snakemake with ``--debug``.
An equivalent script written in R would look like this:
An equivalent script (:ref:`to the Python one above <Python>`) written in R would look like this:

.. code-block:: r

Expand All @@ -664,6 +650,7 @@ An equivalent script written in R would look like this:

To debug R scripts, you can save the workspace with ``save.image()``, and invoke R after Snakemake has terminated. Then you can use the usual R debugging facilities while having access to the ``snakemake`` variable.
It is best practice to wrap the actual code into a separate function. This increases the portability if the code shall be invoked outside of Snakemake or from a different rule.
A convenience method, ``snakemake@source()``, acts as a wrapper for the normal R ``source()`` function, and can be used to source files relative to the original script directory.

An R Markdown file can be integrated in the same way as R and Python scripts, but only a single output (html) file can be used:

Expand Down Expand Up @@ -713,6 +700,139 @@ In the R Markdown file you can insert output from a R command, and access variab
A link to the R Markdown document with the snakemake object can be inserted. Therefore a variable called ``rmd`` needs to be added to the ``params`` section in the header of the ``report.Rmd`` file. The generated R Markdown file with snakemake object will be saved in the file specified in this ``rmd`` variable. This file can be embedded into the HTML document using base64 encoding and a link can be inserted as shown in the example above.
Also other input and output files can be embedded in this way to make a portable report. Note that the above method with a data URI only works for small files. An experimental technology to embed larger files is using Javascript Blob `object <https://developer.mozilla.org/en-US/docs/Web/API/Blob>`_.

Julia_
~~~~~~

.. _Julia: https://julialang.org

.. code-block:: python

rule NAME:
input:
"path/to/inputfile",
"path/to/other/inputfile"
output:
"path/to/outputfile",
"path/to/another/outputfile"
script:
"path/to/script.jl"

In the Julia_ script, a ``snakemake`` object is available, which can be accessed similar to the :ref:`Python case <Python>`, with the only difference that you have to index from 1 instead of 0.

Rust_
~~~~~

.. _Rust: https://www.rust-lang.org/

.. code-block:: python

rule NAME:
input:
"path/to/inputfile",
"path/to/other/inputfile",
named_input="path/to/named/inputfile",
output:
"path/to/outputfile",
"path/to/another/outputfile"
params:
seed=4
log:
stdout="path/to/stdout.log",
stderr="path/to/stderr.log",
script:
"path/to/script.rs"

The ability to execute Rust scripts is facilitated by |rust-script|_. As such, the
script must be a valid ``rust-script`` script and ``rust-script`` must be available in the
environment the rule is run in.

Some example scripts can be found in the
`tests directory <https://github.com/snakemake/snakemake/tree/main/tests/test_script/scripts>`_.

In the Rust script, a ``snakemake`` instance is available, which is automatically generated from the python snakemake object using |json_typegen|_.
It usually looks like this:

.. code-block:: rust

pub struct Snakemake {
input: Input,
output: Ouput,
params: Params,
wildcards: Wildcards,
threads: u64,
log: Log,
resources: Resources,
config: Config,
rulename: String,
bench_iteration: Option<usize>,
scriptdir: String,
}

Any named parameter is translated to a corresponding ``field_name: Type``, such that ``params.seed`` from the example above can be accessed just like in python, i.e.:

.. code-block:: rust

let seed = snakemake.params.seed;
assert_eq!(seed, 4);

Positional arguments for ``input``, ``output``, ``log`` and ``wildcards`` can be accessed by index and iterated over:

.. code-block:: rust

let input = &snakemake.input;

// Input implements Index<usize>
let inputfile = input[0];
assert_eq!(inputfile, "path/to/inputfile");

// Input implements IntoIterator
//
// prints
// > 'path/to/inputfile'
// > 'path/to/other/inputfile'
for f in input {
println!("> '{}'", &f);
}


It is also possible to redirect ``stdout`` and ``stderr``:

.. code-block:: rust

println!("This will NOT be written to path/to/stdout.log");
// redirect stdout to "path/to/stdout.log"
let _stdout_redirect = snakemake.redirect_stdout(snakemake.log.stdout)?;
println!("This will be written to path/to/stdout.log");

// redirect stderr to "path/to/stderr.log"
let _stderr_redirect = snakemake.redirect_stderr(snakemake.log.stderr)?;
eprintln!("This will be written to path/to/stderr.log");
drop(_stderr_redirect);
eprintln!("This will NOT be written to path/to/stderr.log");

Redirection of stdout/stderr is only "active" as long as the returned ``Redirect`` instance is alive; in order to stop redirecting, drop the respective instance.

In order to work, rust-script support for snakemake has some dependencies enabled by default:

#. ``anyhow=1``, for its ``Result`` type
#. ``gag=1``, to enable stdout/stderr redirects
#. ``json_typegen=0.6``, for generating rust structs from a json representation of the snakemake object
#. ``lazy_static=1.4``, to make a ``snakemake`` instance easily accessible
#. ``serde=1``, explicit dependency of ``json_typegen``
#. ``serde_derive=1``, explicit dependency of ``json_typegen``
#. ``serde_json=1``, explicit dependency of ``json_typegen``

If your script uses any of these packages, you do not need to ``use`` them in your script. Trying to ``use`` them will cause a compilation error.

.. |rust-script| replace:: ``rust-script``
.. _rust-script: https://rust-script.org/
.. |json_typegen| replace:: ``json_typegen``
.. _json_typegen: https://github.com/evestera/json_typegen

----

For technical reasons, scripts are executed in ``.snakemake/scripts``. The original script directory is available as ``scriptdir`` in the ``snakemake`` object.

.. _snakefiles_notebook-integration:

Jupyter notebook integration
Expand Down