Skip to content

MiraldiLab/snakeATAC

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

snakeATAC

snakeatac.png

Yet another snakemake workflow for ATAC-seq data processing. This pipeline was created from code developed by:

For SLURM setup we reference:

Workflow Overview

Snakemake pipelines promote experimental reproducibility. For this project, you should have the following inputs customized for your analysis:

  1. A config.yaml that describes the run parameters and location of reference data.
  2. A tab-delimited sample meta file file that describes the experiments to download from SRA and how to group them.
  3. A unique output directory.

A detailed overview of the steps in the ATAC-seq data processing are found on the maxATAC wiki site.

This version of snakeATAC is geared towards use with maxATAC and TOBIAS for making TF binding predictions.

Installation

This pipeline uses Anaconda and Snakemake. Follow the Snakemake install instructions for the best experience. Below is a brief overview of how to install Snakemake.

Create environment

Create a conda environment and download mamba:

conda create -n snakeatac -c conda-forge -c bioconda mamba snakemake

Activate the snakeatac environment:

conda activate snakeatac

Clone the snakeATAC repository

In your favorite directory clone the snakeATAC repo:

git clone https://github.com/tacazares/snakeATAC.git

Set up run-specific parameters

If you are running this pipeline for your first time, you will need to install all the conda environments used and perform a dry-run to make sure that everything was installed right.

  1. Adjust the config.yaml and the tab-delimited sample meta file for your specific experiment.

  2. Change to the working directory for snakeATAC. By default, Snakemake will look for a file called Snakefile with the rules and run information. You can use a custome Snakefile with the -s flag followed by the path to the file.

    cd ./snakeATAC/
  3. Next, use the --conda-create-envs-only flag to create the environments.

    snakemake --cores 14 --use-conda --conda-frontend mamba --conda-create-envs-only --configfile ./inputs/config.yaml
  4. Test the workflow and scripts are correctly set up by performing a dry-run with the --dry-run flag.

    snakemake --cores 14 --use-conda --conda-frontend mamba  --configfile ./inputs/config.yaml --dry-run

Test snakeATAC

The ./snakeATAC/inputs/GM12878_sample.tsv contains information for a test run to process GM12878 OMNI ATAC-seq data.

After install, you can run the full run using your favorite HPC system.

snakemake --cores 14 --use-conda --conda-frontend mamba  --configfile ./inputs/config.yaml

Use Snakemake to submit jobs through SLURM

If you want to use Snakemake to submit jobs to slurm, you will need to follow the instruction described by jdblischak/smk-simple-slur repo. The directory and scripts are included in this repository, but you will need to adjust the account information. You can also adjust any defaults that you wish to use with your job submissions. NOTE: You will need to use chmod +x status-sacct.sh to make the script executable.

Example .bat file to drive the snakeATAC workflow

#!/bin/bash
#SBATCH -D ./outputs
#SBATCH -J dmnd_snake 
#SBATCH -t 96:00:00
#SBATCH --ntasks=8
#SBATCH --mem=16gb
#SBATCH --account={YOUR_ACCOUNT}
#SBATCH --output ./outputs/snakeatac-%j.out
#SBATCH --error ./outputs/snakeatac-%j.err

# Load modules
module load python/3.7-2019.10

# Load the snakemake/mamba env
source activate mamba

# go to a particular directory
cd ./snakeATAC

# make things fail on errors
set -o nounset
set -o errexit
set -x

### run your commands here!
# Develop from the below links
# https://bluegenes.github.io/snakemake-via-slurm/
# https://github.com/jdblischak/smk-simple-slurm

snakemake -s /snakeATAC/Snakefile \
--use-conda \
--conda-frontend mamba \
--configfile /snakeATAC/inputs/config.yaml \
--profile simple/

About

A snakemake workflow to process ATAC-seq data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 50.1%
  • Python 41.6%
  • Shell 8.3%