lcWGS pipes

This repo contains Snakemake workflows for performing population genomics analyses from low-coverage WGS data. The pipelines assume that you already have processed your raw sequencing reads into sorted .bam files for each individual sample in your analysis.

Software installation

These pipelines require installing the following software:

Snakemake, for all pipelines.
angsd, for all pipelines.
PCAngsd, for the genomic PCA pipeline.

See the websites for each of these software packages for installation. Depending on your preference, you may wish to install some or all of these software packages via conda.

Snakemake pipelines can be run locally (on laptops or desktops), but these pipelines will perform better on a high-performance computing cluster. Snakemake makes deploying pipelines on a cluster relatively simple. The SLURM_profile folder contains an example profile for running these pipelines on a SLURM cluster, inspired by [this profile([https://github.com/jdblischak/smk-simple-slurm). You'll need to modify it to work with your cluster.

Current pipelines

This repo provides the following pipelines:

angsd_GL_genome_wide.smk: calculate genotype likelihoods (GLs) across the genome. This pipeline parallelizes across the scaffolds/contigs in the genome to speed computation and reduce peak memory usage. This pipeline is a prerequisite for the other two pipelines.
angsd_window_fst.smk: calculates Fst in sliding windows across the genome. Also parallelizes across scaffolds/contigs.
pcangsd_thin_and_PCA.smk: Thins markers by position and then performs a genomic PCA on the thinned markers using PCAngsd.

Running pipelines

The snakemake .smk files describe the pipeline, and must be paired with a configuration file that specifies various options for running the pipeline (input data, filtering options, etc). There are template configuration files for each pipeline in the sm_config folder, which explain the various parameters that need to be specified.

After you edit the configuration files for a given pipeline, run the pipeline by invoking snakemake:

snakemake -s path/to/snakefile.smk \
          --configfile path/to/configfile.yml

When running on a SLURM cluster, use the --profile flag to tell snakemake the folder in which your (edited) config.yaml profile is saved:

snakemake -s path/to/snakefile.smk \
          --configfile path/to/configfile.yml \
          --profile SLURM_profile_folder/

In general, I recommend checking that the pipeline is doing what you're expecting by adding the --dryrun flag to your call to smakemake. Then, run the pipleline. You may also wish to run the pipeline with the -k option, to allow independent jobs to continue running if a single job fails.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
SLURM_profile		SLURM_profile
sm_config		sm_config
snakefiles		snakefiles
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SLURM_profile

SLURM_profile

sm_config

sm_config

snakefiles

snakefiles

.gitignore

.gitignore

README.md

README.md

Repository files navigation

lcWGS pipes

Software installation

Current pipelines

Running pipelines

About

Releases

Packages

Languages

tjthurman/lcWGS_pipes

Folders and files

Latest commit

History

Repository files navigation

lcWGS pipes

Software installation

Current pipelines

Running pipelines

About

Resources

Stars

Watchers

Forks

Languages