Skip to content

ComputationalAgingLab/quaich_aging

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Snakemake workflow: quaich_aging

Snakemake DOI

Quaich_aging is an extension of Quaich specialized for the analysis of aging-related data. The extension involves additional modules for pairwise comparisons of features obtained from Hi-C maps, as well as modules for plotting graphs that are convenient for the further analysis.

Quaich is a snakemake based workflow for reproducible and flexible analysis of Hi-C data. Quaich uses multi-resolution cooler (.mcool) files as its input. These files can be generated efficiently by the distiller data processing pipeline. Quaich takes advantage of the open2c ecosystem for analysis of C data, primarily making use of command line tools from cooltools. Quaich also makes use of chromosight and mustache to call Hi-C peaks (peaks, dots) as well as coolpuppy to generate lots of pileups.

Snakemake is a workflow manager for reproducible and scalable data analyses, based around the concept of rules. Rules used in Quaich are defined in the Snakefile. Quaich then uses a yaml config file to specify which rules to run, and which parameters to use for those rules.

Usage

Step 1: Obtain a copy of this workflow

Clone the repository to your local system, into the place where you want to perform the data analysis. For example, use the following command to clone the repository:

git clone git@github.com:ComputationalAgingLab/quaich_aging.git

Move to your working directory:

cd quaich_aging

Step 2: Install Snakemake and other requirements

Configure conda channel priority:

conda config --set channel_priority flexible

Install requirements using conda (it may require some time):

conda env create -f workflow/envs/environment.yml

This will create an environment quaich_aging where you can launch the pipeline.

For Snakemake installation details, see the instructions in the Snakemake documentation.

Step 3 (optional): Execute test workflow

Activate the conda environment:

conda activate quaich_aging

Configure the conda environment channel priority with the following small (but critical) line:

conda config --set channel_priority strict

Download genome fasta file necessary for the test (don't forget to permit the file execution if needed by the command chmod +x prepare_test.sh):

bash prepare_test.sh

Execute the test workflow locally via

snakemake --use-conda --configfile config/config.yml --cores 10

Step 4: Configure your own workflow

Configure the workflow according to your needs via editing the files in the config/ folder. Adjust config.yaml to configure the workflow execution, and samples.tsv to specify your sample setup. If you want to use any external bed or bedpe files for pileups, describe them in the annotations.tsv file, and pairings of samples with annotations in samples_annotations.tsv.

Step 5: Execute your own workflow

Test your configuration by performing a dry-run via

snakemake --use-conda --configfile config/config.yml -n

As before, execute the workflow locally via

snakemake --use-conda --configfile config/config.yml --cores $N

using $N cores or run it in a cluster environment via

snakemake --use-conda --configfile config/config.yml --cluster qsub --jobs 100

If the wait is too long

Try mamba distributive instead of conda but having all its functional:

conda install -n base -c conda-forge mamba

Reset your current base environment:

conda activate base

Then install the environment using mamba

mamba env create -f workflow/envs/environment.yml

Available features

The following analyses can be configured in the original pipeline:

  • eigenvector: calculates cis eigenvectors using cooltools for all resolutions within specified resolution_limits.
  • saddle: calculates saddles, reflecting average interaction preferences, from cis eigenvectors for each sample using cooltools.
  • pileups: extract regions of interest (e.g. according to some bed file) from Hi-C maps and build aggregated data frames containing averages of these regions.
  • insulation: calculates diamond insulation score for specified resolutions and window sizes, using cooltools. Currently runs separately for different window sizes.
  • call_dots: three methods of calling dots, at specified resolutions, and postprocess output to bedpe. Implemented callers are cooltools, mustache and chromosight. Only runs on specified samples.
  • compare_boundaries: generates differential boundaries between specified samples, used as input for pileups.
  • call_TADs: combines lists of strong boundaries for specified samples into a list across window sizes for each resolution, filtered by length, used as input for pileups.

The following analyses added in the quaich_aging:

  • interchroms: computes a matrix of contacts sums for all possible pairs of chromosomes.
  • compare_interchroms: plots the normalized ratio of selected pairs of contact sums matrices in a form of heatmap.
  • scaling_ratio: plots the ratio of selected pairs of scaling profiles.
  • eigenvectors_correlation: plots eigenvectors correlation clustermap for each particular chromosome and for the full genome.
  • tad_ratio: plots the ratio of selected pairs of averaged and normalized TADs.
  • loop_ratio: plots the ratio of selected pairs of averaged and normalized loops.

Authors

  • This is the fork of Ilya Flyamer's (@phlya) original project modified by Dmitrii Kriukov (@shappiron) for aging-related data analysis.

About

Fork of snakemake pipeline for Hi-C post-processing adapted specifically for aging-related data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.5%
  • Shell 0.5%