Skip to content

GooglingTheCancerGenome/sv-gen

Repository files navigation

sv-gen

DOI CI Codacy Badge Codacy Badge

Structural variants (SVs) are an important class of genetic variation implicated in a wide array of genetic diseases. sv-gen is a Snakemake-based workflow to generate artificial short-read alignments based on a reference genome with(out) SVs. The workflow is easy to use and deploy on any Linux-based machine. In particular, the workflow supports automated software deployment, easy configuration and addition of new analysis tools as well as enables to scale from a single computer to different HPC clusters with minimal effort.

Dependencies

  • Python 3
  • Conda - package/environment management system
  • Snakemake - workflow management system
  • Xenon CLI - command-line interface to compute and storage resources
  • jq - command-line JSON processor (optional)
  • YAtiML - library for YAML type inference and schema validation

The workflow (DAG) includes the following tools:

The software dependencies and versions can be found in the conda environment.yaml files (1, 2).

1. Clone this repo.

git clone https://github.com/GooglingTheCancerGenome/sv-gen.git
cd sv-gen

2. Install dependencies.

# download Miniconda3 installer
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
# install Conda (respond by 'yes')
bash miniconda.sh
# update Conda
conda update -y conda
# install Mamba
conda install -n base -c conda-forge -y mamba
# create a new environment with dependencies & activate it
mamba env create -n wf -f environment.yaml
conda activate wf

3. Configure the workflow.

4. Execute the workflow.

cd workflow
# 'dry' run only checks I/O files
snakemake -np

# run the workflow locally
snakemake --use-conda --cores

Submit jobs to Slurm/GridEngine-based cluster

SCH=slurm   # or gridengine
snakemake --use-conda --latency-wait 30 --jobs \
--cluster "xenon scheduler $SCH --location local:// submit --name smk.{rule} --inherit-env --max-run-time 5 --working-directory . --stderr stderr-%j.log --stdout stdout-%j.log" &>smk.log&

Query job accounting information

SCH=slurm   # or gridengine
xenon --json scheduler $SCH --location local:// list --identifier [jobID] | jq ...