Umi-pipeline-nf Usage

This document describes the parameter options used by the pipeline.

Running the pipeline
Inputs and outputs
Workflow modifying parameters
- --subsampling
- --call_variants
Read filtering parameters
- --min_read_length
- --min_qscore
Subsampling parameters
- --subsampling_seed
- --subsampling_readnumber
Variant calling parameters
- --variant_caller
Advanced parameters
Additional parameters
Software dependencies
- -profile
- -with-conda
Other command line parameters

Workflow

Running the pipeline

The main command for running the pipeline is as follows:

nextflow run genepi/umi-pipeline-nf [OPTIONS]

Note that the pipeline will create files in your working directory:

work/           # Directory containing the nextflow working files
.nextflow.log   # Log file from Nextflow
.nextflow/      # Nextflow cache and history information

Inputs and Outputs

`--input <ARG>` [REQUIRED]

Specify the path to the directory containing the demultiplexed reads. [default: "null"]

`--output <ARG>` [REQUIRED]

Specify the name of the output directory to save the results. [default: "null"]

`--reference <ARG>` [REQUIRED]

Specify the path to the reference genome in fasta format. [default: "null"]

`--reference_fai <ARG>` [REQUIRED]

Specify the path to the reference index file. [default: "null"]

`--bed <ARG>` [REQUIRED]

Specify the path to the bed file containing the target name, start and end position. [default: "null"]

Workflow Modifying Parameters

`--subsampling`

Specify if the raw reads per barcode should be subsampled. [default: false]

`--call_variants`

Specify if the variants in the final consensus sequences should be called. [default: false] Note: If set to true, the (variant caller)[ must be specified

Read Filtering Parameters

`--min_read_length <ARG>`

Specify the minimal read length of the raw reads. [default: 0]

`--min_qcore <ARG>`

Specify the minimal Q-score of the raw reads. [default: 0]

Subsampling Parameters

`--subsampling_seed <ARG>`

Specify the seed for pseudo-random subsampling. [default: 11]

`--subsampling_readnumber <ARG>`

Specify the number of reads that should be preserved after subsampling. [default: 100,000]

Variant Calling Parameters

`--variant_caller <ARG>` [REQUIRED] [lofreq | freebayes | mutserve]

Specify the variant caller that should be used for variant calling. [default: null]

Advanced Parameters

`--min_reads_per_barcode <ARG>`

Specify the minimal number of raw reads per barcode. [default: 1000]
Note: Barcodes with fewer reads will be filtered before the analysis.

`--umi_errors <ARG>`

Specify the number of deviating positions per read of each UMI cluster. [default: 3]

`--min_reads_per_cluster <ARG>`

Specify the minimal number of reads per cluster. [default: 20] Note: Clusters with fewer reads will be excluded from the analysis.

`--max_reads_per_cluster <ARG>`

Specify the maximal number of reads per cluster. [default: 60] Note: Clusters with more reads will be downsampled
according to the cluster filtering strategy.

`--output_format <ARG>` [fasta | fastq]

Specify the output format until the cluster filtering step. [default: "fastq"]
Note: Only the "fastq" option can be used to filter reads by the quality and requires FASTQ input reads.

`--filter_strategy_clusters <ARG>` [random | quality]

Specify filter strategy to downsample cluster above the maximal reads per cluster parameter. [default: "quality"]
Note: "quality" filter strategy only goes in hand with "fastq" output format.

`--write-reports <ARG>`

Specify if reports of the pipeline should be saved. [default: true]

`--threads <ARG>`

Specify the number of threads that are allocated for the pipeline. [default: all available processors - 1]

`--min_overlap <ARG>` [0-1]

Specify the minimal overlap of the mapped raw reads to the reference sequence. [default: 0.90]

`--include_secondary_reads <ARG>`

Specify if secondary mappings should be included in the analysis. [default: false]

`--balance_strands <ARG>`

Specify if the number of forward and reverse reads per cluster should be equalized. [default: true]

`--medaka_model <ARG>`

Specify the medaka model that is used for cluster polishing. [default: "r1041_e82_400bps_hac_g615"]
Note: The models are specific for Chemistry, basecalling algorithm and sequencing speed.

`--fwd_umi <ARG>`

Specify the pattern of the forward UMI primer. [default: "TTTVVVVTTVVVVTTVVVVTTVVVVTTT"]

`--rev_umi <ARG>`

Specify the pattern of the reverse UMI primer. [default: "AAABBBBAABBBBAABBBBAABBBBAAA"]

`--min_length <ARG>`

Specify the minimal length of the extracted UMI sequences. [default: 40]

`--min_length <ARG>`

Specify the maximal length of the extracted UMI sequences. [default: 60]

`--minimap2_param <ARG>`

Specify the minimap2 parameters. [default: "-ax map-ont -k 13]

Additional Parameters

`--debug`

Specify in order to prevent Nextflow from clearing the work dir cache following a successful pipeline completion. [default: off]

`--version`

When called with nextflow run genepi/umi-pipeline-nf --version this will display the pipeline version and quit.

`--help`

When called with nextflow run genepi/umi-pipeline-nf --help this will display the parameter options and quit.

Software Dependencies

`-profile <ARG>`

Use this parameter to choose a preset configuration profile. Profiles available with the pipeline are:

standard
- The default profile, is used if -profile is not specified.
- Uses sensible resource allocation for, runs using the local executor (native system calls) and expects all software to be installed and available on the $PATH.
- This profile is mainly designed to be used as a starting point for other configurations and is inherited by most of the other profiles below.
conda
- Builds a conda environment from the environment.yml file provided by the pipeline
- Requires conda to be installed on your system.
docker
- Launches a docker image pulled from genepi/umi-pipeline-nf (ADAPT)
- Requires docker to be installed on your system.
singularity
- Launches a singularity image pulled from genepi/umi-pipeline-nf (ADAPT)
- Requires singularity to be installed on your system.
custom
- No configuration at all. Useful if you want to build your own config from scratch and want to avoid loading in the default base config for process resource allocation.

If you wish to provide your own package containers it is possible to do so by setting the standard or custom profile and then providing your custom package with the command line flags below. These are not required with the other profiles.

`-with-conda <ARG>`

Flag to enable conda. You can provide either a pre-built environment or a *.yml file.

Other command line parameters

`-work-dir <ARG>`

Specify the path to a custom work directory for the pipeline to run with (eg. on a scratch directory)

`-params-file <ARG>`

Provide a file with specified parameters to avoid typing them out on the command line. This is useful for carrying out repeated analyses. A template params file assets/params.config has been made available in the pipeline repository.

`-config <ARG>`

Provide a custom config file for adapting the pipeline to run on your own computing infrastructure. A template config file assets/custom.config has been made available in the pipeline repository. This file can be used as a boilerplate for building your own custom config.

`-resume [<ARG>]`

Specify this when restarting a pipeline. Nextflow will use cached results from any pipeline steps where the inputs are the same, continuing from where they got previously. Give a specific pipeline name as an argument to resume it, otherwise, Nextflow will resume the most recent. NOTE: This will not work if the specified run finished successfully and the cache was automatically cleared. (see: --debug)

`-name <ARG>`

Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic.

Files

usage.md

Latest commit

History

usage.md

File metadata and controls

Umi-pipeline-nf Usage

Workflow

Running the pipeline

Inputs and Outputs

--input <ARG> [REQUIRED]

--output <ARG> [REQUIRED]

--reference <ARG> [REQUIRED]

--reference_fai <ARG> [REQUIRED]

--bed <ARG> [REQUIRED]

Workflow Modifying Parameters

--subsampling

--call_variants

Read Filtering Parameters

--min_read_length <ARG>

--min_qcore <ARG>

Subsampling Parameters

--subsampling_seed <ARG>

--subsampling_readnumber <ARG>

Variant Calling Parameters

--variant_caller <ARG> [REQUIRED] [lofreq | freebayes | mutserve]

Advanced Parameters

--min_reads_per_barcode <ARG>

--umi_errors <ARG>

--min_reads_per_cluster <ARG>

--max_reads_per_cluster <ARG>

--output_format <ARG> [fasta | fastq]

--filter_strategy_clusters <ARG> [random | quality]

--write-reports <ARG>

--threads <ARG>

--min_overlap <ARG> [0-1]

--include_secondary_reads <ARG>

--balance_strands <ARG>

--medaka_model <ARG>

--fwd_umi <ARG>

--rev_umi <ARG>

--min_length <ARG>

--min_length <ARG>

--minimap2_param <ARG>

Additional Parameters

--debug

--version

--help

Software Dependencies

-profile <ARG>

-with-conda <ARG>

Other command line parameters

-work-dir <ARG>

-params-file <ARG>

-config <ARG>

-resume [<ARG>]

-name <ARG>

`--input <ARG>` [REQUIRED]

`--output <ARG>` [REQUIRED]

`--reference <ARG>` [REQUIRED]

`--reference_fai <ARG>` [REQUIRED]

`--bed <ARG>` [REQUIRED]

`--subsampling`

`--call_variants`

`--min_read_length <ARG>`

`--min_qcore <ARG>`

`--subsampling_seed <ARG>`

`--subsampling_readnumber <ARG>`

`--variant_caller <ARG>` [REQUIRED] [lofreq | freebayes | mutserve]

`--min_reads_per_barcode <ARG>`

`--umi_errors <ARG>`

`--min_reads_per_cluster <ARG>`

`--max_reads_per_cluster <ARG>`

`--output_format <ARG>` [fasta | fastq]

`--filter_strategy_clusters <ARG>` [random | quality]

`--write-reports <ARG>`

`--threads <ARG>`

`--min_overlap <ARG>` [0-1]

`--include_secondary_reads <ARG>`

`--balance_strands <ARG>`

`--medaka_model <ARG>`

`--fwd_umi <ARG>`

`--rev_umi <ARG>`

`--min_length <ARG>`

`--min_length <ARG>`

`--minimap2_param <ARG>`

`--debug`

`--version`

`--help`

`-profile <ARG>`

`-with-conda <ARG>`

`-work-dir <ARG>`

`-params-file <ARG>`

`-config <ARG>`

`-resume [<ARG>]`

`-name <ARG>`