SCARlink

Preprint available on bioRxiv.

Documentation coming soon!

Single-cell ATAC+RNA linking (SCARlink) uses multiomic single-cell ATAC and RNA to predict gene expression from chromatin accessibility and predict regulatory regions. It uses tile-based chromatin accessibity data from a genomic locus (+/-250kb flanking gene body) and applies regularized Poisson regression to predict gene expression. The learned regression coefficients of the tiles are informative of regulatory regions in a gene-specific manner. The regression model by itself is cell-type-agnostic. Shapley values computed by grouping cells provide insight into the regulatory regions in a cell-type specific manner. Therefore, the trained regression model can be used multiple times to identify important tiles for different groupings without retraining.

Installation

To install SCARlink first create a conda environment:

conda create -n scarlink-env python=3.8
conda activate scarlink-env

Install essential R packages

conda install -c conda-forge r-seurat r-devtools r-biocmanager
conda install -c bioconda bioconductor-rhdf5 \
                     bioconductor-chromvar \
                     bioconductor-motifmatchr \
                     bioconductor-oncomix \
                     bioconductor-complexheatmap

Install ArchR in the conda environment inside R

devtools::install_github("GreenleafLab/ArchR", ref="master", repos = BiocManager::repositories())

Download SCARlink from GitHub and install

git clone https://github.com/snehamitra/SCARlink.git
pip install -e SCARlink

Usage

SCARlink requires preprocessed scRNA-seq in a Seurat object and scATAC-seq in ArchR object. The cell names in both the objects need to be identical. Please refer to the example notebook for generating Seurat and ArchR objects.

1. Preprocessing

Run scarlink_preprocessing to generate coasssay_matrix.h5 to use as input to SCARlink.

scarlink_processing --scrna scrna_seurat.rds --scatac scatac_archr -o multiome_out

2. Running locally

For small data sets with few genes, run SCARlink sequentially on the gene set in the same output directory multiome_out, as before, for all the remaining steps. Note that celltype needs to be present in either scrna_seurat.rds or scatac_archr.

scarlink -o multiome_out -g hg38 -c celltype

2a. Running locally without cell type information during training and computing cell type scores afterwards

SCARlink can also be run without providing cell type information using -c. In that case SCARlink only computes the gene-level regression model and does not estimate cell-type-specific Shapley values.

scarlink -o multiome_out -g hg38

The cell-type-specific Shapley scores can be computed later on by running scarlink with the -c parameter. In that case the previously estimated gene-regression models are used.

scarlink -o multiome_out -g hg38 -c celltype

2b. Running locally with different cell type grouping

SCARlink can be run again with a different cell type annotation. For example, more granular annotations. Both celltype and celltype_granular scores would be retained.

scarlink -o multiome_out -g hg38 -c celltype_granular

2c. Running on cluster by parallelizing over gene set

To speed up computation, SCARlink can be run on a cluster in parallel. By default it parallelizes over 100 cores, -np 100 when -np is not provided.

scarlink -o multiome_out -g hg38 -c celltype -p $LSB_JOBINDEX

3. Get FDR corrected gene-linked tiles

Get table with tile-level siginificance for each gene and celltype.

scarlink_tiles -o multiome_out -c celltype

4. Chromatin potential

Chromatin potential analysis can be performed after running steps 1 and 2. Please refer to example notebook.

5. Plot SCARlink output

SCARlink can create visualizations for the chromatin accessibility around a gene (250kb upstream and downstream of the gene body) and the gene expression for a given gene, grouped by user-provided celltype. The visualizations can help understand the connection between the accessibility at tile-level to gene expression. Note that SCARlink can generate visualizations for different cell type groupings on the fly. Example visulizations are provided in a notebook. The visualizations can also be generated at the command line.

scarlink_tiles -o multiome_out -c celltype --genes GENE1,GENE2

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
docs		docs
notebooks		notebooks
scarlink		scarlink
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs

docs

notebooks

notebooks

scarlink

scarlink

.gitignore

.gitignore

README.md

README.md

setup.py

setup.py

Repository files navigation

SCARlink

Installation

Usage

1. Preprocessing

2. Running locally

2a. Running locally without cell type information during training and computing cell type scores afterwards

2b. Running locally with different cell type grouping

2c. Running on cluster by parallelizing over gene set

3. Get FDR corrected gene-linked tiles

4. Chromatin potential

5. Plot SCARlink output

About

Releases

Packages

Languages

akanksha-sachan/SCARlink

Folders and files

Latest commit

History

Repository files navigation

SCARlink

Installation

Usage

1. Preprocessing

2. Running locally

2a. Running locally without cell type information during training and computing cell type scores afterwards

2b. Running locally with different cell type grouping

2c. Running on cluster by parallelizing over gene set

3. Get FDR corrected gene-linked tiles

4. Chromatin potential

5. Plot SCARlink output

About

Resources

Stars

Watchers

Forks

Languages