Best-paractice of cross-workflow specification of files #20

SilasK · 2023-05-30T06:35:22Z

I would like to discuss what is the best way to specify files in a way that they can be used across workflows.

Take the example of two workflows e.g

Workflow 1: reads --> assembly

Workflow 2: assembly + reads --> assembly statistics ...

What is the best way to specify the reads and assembly so that they can be used by different workflows?
Take into account that
Requirement A: The reads might be used at multiple places in Workflow 2.
Requirement B : The reads are probably to be used to infer the total number of samples in the target rule.

With sub-workflows, it would be possible to define otherworkflow(file)

But I think the recommended way now is to use modules and to import the rules Workflow 1 and 2 in a new workflow.
But then I should know which rules I need to modify to adapt the file specification. This should be necessarily defined in the Readme of a workflow.

I don't see how this can be done without massive modifying many rules of an imported workflow.

Any thoughts?

The text was updated successfully, but these errors were encountered:

ning-y · 2023-05-30T20:34:33Z

Here's a first attempt:

Workflow 1 input reads are determined by YAML configuration file, and the final assembly file is tagged either in its contents e.g. header lines, or filename; with a hash representing the input reads used to generate it e.g. hash of read hashes.

Workflow 2 takes input reads and input assembly also by YAML configuration file. It checks either on each run or through a dummy output that the input assembly's information about which input reads were used to generate it matches with the set of input reads it was given.

SilasK · 2023-05-31T09:14:43Z

Your idea would be to define the path to the files

Something like:

config.yam

read_file_format: "QC/qc_reads/{sample}_{fraction}.fastq.gz"
assembly_file_format: "Assembly/assemblies/{sample}.fasta.gz"

SilasK · 2023-05-31T09:18:10Z

One could also use a tsv file in which we will specify the headers in a config file.

Ideally using the https://snakemake.readthedocs.io/en/stable/snakefiles/configuration.html#configuring-scientific-experiments-via-peps

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best-paractice of cross-workflow specification of files #20

Best-paractice of cross-workflow specification of files #20

SilasK commented May 30, 2023

ning-y commented May 30, 2023

SilasK commented May 31, 2023

SilasK commented May 31, 2023

Best-paractice of cross-workflow specification of files #20

Best-paractice of cross-workflow specification of files #20

Comments

SilasK commented May 30, 2023

ning-y commented May 30, 2023

SilasK commented May 31, 2023

SilasK commented May 31, 2023