Skip to content

luntergroup/polyploid

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

polyploid

Benchmarking variant calling in polyploids

Running experiments

All experiments are reproducable using a Snakemake workflow. First clone the repository:

$ git clone https://github.com/luntergroup/polyploid && cd polyploid

You will need to install Conda, if not already:

$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh # follow instructions, answer 'yes' where asked
$ source ~/.bashrc # assuming you installed conda into your home directory
$ conda update conda

Install Snakemake and general dependencies with conda:

$ conda config --add channels defaults
$ conda config --add channels bioconda
$ conda config --add channels conda-forge
$ conda create --name polyploid snakemake pysam python-wget openpyxl mamba
$ conda activate polyploid

Each set of experiments is specified in a YAML config file in the config directory. They are self-contained, but download links for the PrecisionFDA Truth v2 raw read data must be specified as these are only accessible after authorisation. You can either download the data manually and rename the files appropriatly, or provide the links in a config file, e.g.:

$ echo "links:\n\tHG002:" >> config/tetraploid_novaseq.yaml
$ echo "\t\t-<link_here>\n\t\t-<link_here>" >> config/tetraploid_novaseq.yaml

Then run each experiment as required, e.g:

$ snakemake --configfile config/tetraploid_novaseq.yaml --use-conda --use-singularity -j 100 --cluster "qsub -cwd -V -j y -P mygroup.prj -q long.qf -pe shmem {threads}"