PanVC Sample Workflow

This is a workflow for variant calling using short reads that utilizes PanVC. The workflow uses an index generated from a multiple sequence alignment. The sequences may be e.g. founder sequences generated from the variants of a group of donors. Using the predicted DNA sequences of the donors is also possible.

Requirements

Linux on x86-64 for using pre-built binaries
Python 3.8
Snakemake
Conda

Installing

Snakemake is used to run the workflow. To simplify the installation of the tools that are used as part of the workflow, Conda environments (in workflow/envs) are used. To set up the environment, Snakemake can run Conda automatically.

Installing the required tools with Snakemake

The required tools will be installed automatically when running Snakemake for either indexing or variant calling. However, to e.g. install the tools only once or run multiple instances of Snakemake in parallel, the tools may be installed to e.g. /path/to/conda/environment as follows:

Run snakemake --snakefile Snakefile.install --cores 1 --printshellcmds --use-conda --conda-prefix /path/to/conda/environment conda_environment
Run snakemake --snakefile Snakefile.install --cores 1 --printshellcmds --use-conda --conda-prefix /path/to/conda/environment conda_environment_gatk

Installing on platforms other than Linux on x86-64

Currently, pre-built binaries are only available for Linux on x86-64. To run the workflow on other architectures, tools from the following repositories need to be installed:

Running

Running the workflow involves the following steps:

Generate indexing input for PanVC
Prepare an index
Run the short read alignment and variant calling workflow

Generating indexing input

Indexing input for PanVC consists of multiple alignments of sequences for one or more chromosomes. For each chromosome, the same number of input sequences must be used. Either predicted sequences of a cohort of donors or founder sequences may be used.

PanVC accepts (a subset of) A2M formatted multiple alignments as its inputs. In particular, the first sequence of each A2M input needs to be the reference sequence for the corresponding chromosome. Only uppercase characters and dashes may be used in the sequences.

Suitable multiple alignments of either predicted or founder sequences may be generated from VCF files with vcf2multialign.

Preparing an index

An index may be generated from a set of A2M files as follows. The amount of memory required depends on the length and complexity of the inputs.

Use generate_snakemake_config_for_index.py to generate a configuration file for Snakemake. Here, the maximum read length and maximum edit distance for each read may be specified. Please use python3 generate_snakemake_config_for_index.py --help for listing all available options. For instance, to prepare an index for two chromosomes, chr1 and chr2, with maximum read length of 105 and maximum edit distance of 10, the following command could be used: python3 generate_snakemake_config_for_index.py --input-a2m chr1.a2m chr2.a2m --pgindex-dir index-output --output-config config-index.yaml --max-edit-distance 10 --max-read-length 105 --max-memory-MB 20000. (The script may also be loaded as a module to be used from Python.)
Run Snakemake with the configuration file with e.g. snakemake --configfile config-index.yaml --snakefile Snakefile.index --cores 16 --printshellcmds --use-conda --conda-prefix /path/to/conda/environment --resources mem_mb=20000

Running the short read alignment and variant calling workflow

The generated index may be used to align a set of short reads and call variants as follows. As in the previous step, the amount of memory required depends on the length and complexity of the inputs.

Use generate_snakemake_config_for_call.py to generate a configuration file for Snakemake. Please use python3 generate_snakemake_config_for_call.py --help for listing all available options. For instance, to call variants from a human sample using GRCh38 reference with both GATK and Samtools (for e.g. comparison), the following command could be used: python3 generate_snakemake_config_for_call.py --pgindex-dir index-output --output-config config-call.yaml -r1 reads1.fq.gz -r2 reads2.fq.gz --output-dir call-output --vc-method GATK SAMTOOLS --ploidy 2 --ploidy-file GRCh38 --max-memory-MB 200000.
(Optional) Generate merged reads if the same reads are aligned multiple times. This is done as part of the variant calling process but can also be done beforehand.
1. Add --store-merged-reads-with-originals to the parameters of generate_snakemake_config_for_call.py. This causes the merged reads to be stored in the same directory as the original reads instead of using the output directory.
2. Run Snakemake with the generated configuration and use panvc_generate_reads_all as the target name, e.g. snakemake --snakefile Snakefile.call --configfile config-call.yaml --cores 1 --printshellcmds --use-conda --conda-prefix /path/to/conda/environment -- panvc_generate_reads_all. Using more than one core may speed up the process in case the reads are compressed.
Run Snakemake with the configuration file with e.g. snakemake --snakefile Snakefile.call --configfile config-call.yaml --cores 16 --printshellcmds --use-conda --conda-prefix /path/to/conda/environment --resources mem_mb=200000

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
pvc_py_tools		pvc_py_tools
workflow		workflow
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Snakefile.call		Snakefile.call
Snakefile.index		Snakefile.index
Snakefile.install		Snakefile.install
generate_snakemake_config_for_call.py		generate_snakemake_config_for_call.py
generate_snakemake_config_for_index.py		generate_snakemake_config_for_index.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pvc_py_tools

pvc_py_tools

workflow

workflow

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Snakefile.call

Snakefile.call

Snakefile.index

Snakefile.index

Snakefile.install

Snakefile.install

generate_snakemake_config_for_call.py

generate_snakemake_config_for_call.py

generate_snakemake_config_for_index.py

generate_snakemake_config_for_index.py

Repository files navigation

PanVC Sample Workflow

Requirements

Installing

Installing the required tools with Snakemake

Installing on platforms other than Linux on x86-64

Running

Generating indexing input

Preparing an index

Running the short read alignment and variant calling workflow

About

Releases

Packages

Contributors 2

Languages

License

algbio/panvc-sample-workflow

Folders and files

Latest commit

History

Repository files navigation

PanVC Sample Workflow

Requirements

Installing

Installing the required tools with Snakemake

Installing on platforms other than Linux on x86-64

Running

Generating indexing input

Preparing an index

Running the short read alignment and variant calling workflow

About

Resources

License

Stars

Watchers

Forks

Languages