snakemake · johanneskoester · Feb 24, 2022 · Feb 23, 2022
@@ -36,7 +36,7 @@ following structure:
     └── resources
 
 In other words, the workflow code goes into a subfolder ``workflow``, while the configuration is stored in a subfolder ``config``.
-Inside of the ``workflow`` subfolder, the central ``Snakefile`` marks the entrypoint of the workflow (it will be automatically discovered when running snakemake from the root of above structure. 
+Inside of the ``workflow`` subfolder, the central ``Snakefile`` marks the entrypoint of the workflow (it will be automatically discovered when running snakemake from the root of above structure.
 This main structure and the recommendations below are implemented in `this Snakemake workflow template <https://github.com/snakemake-workflows/snakemake-workflow-template>`_ that you can use to `create your own workflow repository with a single click on "Use this template" <https://github.com/snakemake-workflows/snakemake-workflow-template/generate>_.
 In addition to the central ``Snakefile``, rules can be stored in a modular way, using the optional subfolder ``workflow/rules``.
 Such modules should end with ``.smk``, the recommended file extension of Snakemake.
@@ -172,7 +172,7 @@ We can extend above example in the following way:
             rules.rna_seq_all.input,
         default_target: True
 
-Above, several things have changed. 
+Above, several things have changed.
 
 * First, we have added another module ``rna_seq``.
 * Second, we have added a prefix to all non-absolute input and output file names of both modules (``prefix: "dna-seq"`` and ``prefix: "rna-seq"``) in order to avoid file name clashes.
@@ -277,13 +277,13 @@ Instead of using a concrete path, it is also possible to provide a path containi
 
    Note that conda environments are only used with ``shell``, ``script`` and the ``wrapper`` directive, not the ``run`` directive.
    The reason is that the ``run`` directive has access to the rest of the Snakefile (e.g. globally defined variables) and therefore must be executed in the same process as Snakemake itself.
-   
-   Further, note that search path modifying environment variables like ``R_LIBS`` and ``PYTHONPATH`` can interfere with your conda environments. 
+
+   Further, note that search path modifying environment variables like ``R_LIBS`` and ``PYTHONPATH`` can interfere with your conda environments.
    Therefore, Snakemake automatically deactivates them for a job when a conda environment definition is used.
    If you know what you are doing, in order to deactivate this behavior, you can use the flag ``--conda-not-block-search-path-envvars``.
 
 Snakemake will store the environment persistently in ``.snakemake/conda/$hash`` with ``$hash`` being the MD5 hash of the environment definition file content. This way, updates to the environment definition are automatically detected.
-Note that you need to clean up environments manually for now. However, in many cases they are lightweight and consist of symlinks to your central conda installation. 
+Note that you need to clean up environments manually for now. However, in many cases they are lightweight and consist of symlinks to your central conda installation.
 
 Conda deployment also works well for offline or air-gapped environments. Running ``snakemake --use-conda --conda-create-envs-only`` will only install the required conda environments without running the full workflow. Subsequent runs with ``--use-conda`` will make use of the local environments without requiring internet access.
 
@@ -301,6 +301,7 @@ Therefore, the approach using environment definition files described above is hi
 Nevertheless, in case you are still sure that you want to use an existing named environment, it can simply be put into the conda directive, e.g.
 
 .. code-block:: python
+
     rule NAME:
         input:
             "table.txt"
@@ -314,7 +315,7 @@ Nevertheless, in case you are still sure that you want to use an existing named
 For such a rule, Snakemake will just activate the given environment, instead of automatically deploying anything.
 Instead of using a concrete name, it is also possible to provide a name containing wildcards (which must also occur in the output files of the rule), analogous to the specification of input files.
 
-Note that Snakemake distinguishes file based environments from named ones as follows: 
+Note that Snakemake distinguishes file based environments from named ones as follows:
 if the given specification ends on ``.yaml`` or ``.yml``, Snakemake assumes it to be a path to an environment definition file; otherwise, it assumes the given specification
 to be the name of an existing environment.
 

@@ -140,7 +140,7 @@ Nevertheless, we can **execute our workflow** with
     $ snakemake --cores 1 mapped_reads/A.bam
 
 Whenever executing a workflow, you need to specify the number of cores to use.
-For this tutorial, we will use a single core for now. 
+For this tutorial, we will use a single core for now.
 Later you will see how parallelization works.
 Note that, after completion of above command, Snakemake will not try to create ``mapped_reads/A.bam`` again, because it is already present in the file system.
 Snakemake **only re-runs jobs if one of the input files is newer than one of the output files or one of the input files will be updated by another job**.
@@ -232,7 +232,7 @@ We add the following rule beneath the ``bwa_map`` rule:
 .. sidebar:: Note
 
   In the shell command above we split the string into two lines, which are however automatically concatenated into one by Python.
-  This is a handy pattern to avoid too long shell command lines. When using this, make sure to have a trailing whitespace in each line but the last, 
+  This is a handy pattern to avoid too long shell command lines. When using this, make sure to have a trailing whitespace in each line but the last,
   in order to avoid arguments to become not properly separated.
 
 This rule will take the input file from the ``mapped_reads`` directory and store a sorted version in the ``sorted_reads`` directory.
@@ -283,7 +283,7 @@ By executing
 
 .. sidebar:: Note
 
-  If you went with: `Run tutorial for free in the cloud via Gitpod`_, you can easily view the resulting ``dag.svg`` by right-clicking on the file in the explorer panel on the left and selecting ``Open With -> Preview``.
+  If you went with: :ref:`tutorial-free-on-gitpod`, you can easily view the resulting ``dag.svg`` by right-clicking on the file in the explorer panel on the left and selecting ``Open With -> Preview``.
 
 
 we create a **visualization of the DAG** using the ``dot`` command provided by Graphviz_.

@@ -50,6 +50,8 @@ To go through this tutorial, you need the following software installed:
 
 However, don't install any of these this manually now, we guide you through better ways below.
 
+.. _tutorial-free-on-gitpod:
+
 Run tutorial for free in the cloud via Gitpod
 :::::::::::::::::::::::::::::::::::::::::::::
 
@@ -69,7 +71,7 @@ Running the tutorial on your local machine
 
 If you prefer to run the tutorial on your local machine, please follow the steps below.
 
-The easiest way to set these prerequisites up, is to use the Mambaforge_ Python 3 distribution 
+The easiest way to set these prerequisites up, is to use the Mambaforge_ Python 3 distribution
 (Mambaforge_ is a Conda based distribution like Miniconda_, which however uses Mamba_ a fast and more robust replacement for the Conda_ package manager).
 The tutorial assumes that you are using either Linux or MacOS X.
 Both Snakemake and Mambaforge_ work also under Windows, but the Windows shell is too different to be able to provide generic examples.
@@ -170,11 +172,13 @@ First, we download some example data on which the workflow shall be executed:
 Next we extract the data. On Linux, run
 
 .. code:: console
+
     $ tar --wildcards -xf snakemake-tutorial-data.tar.gz --strip 1 "*/data" "*/environment.yaml"
 
 On MacOS, run
 
 .. code:: console
+
     $ tar -xf snakemake-tutorial-data.tar.gz --strip 1 "*/data" "*/environment.yaml"
 
 This will create a folder ``data`` and a file ``environment.yaml`` in the working directory.
@@ -194,7 +198,7 @@ The ``environment.yaml`` file that you have obtained with the previous step (Ste
 
     $ mamba env create --name snakemake-tutorial --file environment.yaml
 
-If you don't have the Mamba_ command because you used a different conda distribution than Mambaforge_, you can also first install Mamba_ 
+If you don't have the Mamba_ command because you used a different conda distribution than Mambaforge_, you can also first install Mamba_
 (which is a faster and more robust replacement for Conda_) in your base environment with
 
 .. code:: console