DELAY: DEpicting LAgged causalitY across single-cell trajectories for accurate gene-regulatory inference

Quick Setup

Follow these instructions to install the latest version of PyTorch with CUDA support: https://pytorch.org
- Please note, DELAY currently requires CUDA-capable GPUs for training and prediction
Confirm that two additional dependencies have been satisfied: pytorch-lightning and pandas
Navigate to the location where you want to clone the repository and run:

git clone https://github.com/calebclayreagor/DELAY.git

Two Steps to Infer Gene-Regulatory Networks

1. Fine-tune DELAY on datasets with partially-known ground-truth interactions, e.g. from ChIP-seq experiments:

python RunDELAY.py [datadir] [outdir] -k [val_fold] [--atac] -p -ft

-k is the validation fold and --atac can optionally specify scATAC-seq input data (default is scRNA-seq)
Use TensorBoard to monitor training by runnning tensorboard --logdir RESULTS from the main directory
By default, DELAY will save the best model weights to a checkpoint file in RESULTS/outdir

2. Predict gene regulation across all TF-target gene pairs using the fine-tuned model:

python RunDELAY.py [datadir] [outdir] -m [RESULTS/outdir/BEST_WEIGHTS.ckpt] -p -g 1 -bs 1024

DELAY will save the predicted gene-regulation probabilities as a tfs x genes matrix in outdir named regPredictions.csv
By default, DELAY will load batches from existing directories, so make sure to delete created folders for all training, validation and prediction batches when finished

For additional help, run python RunDELAY.py --help

Required Input Files for Single-Cell Datasets

DELAY will expect unique sub-directories for each dataset in datadir containing the following files:

NormalizedData.csv — A labeled genes x cells matrix of gene-expression or accessibility values
PseudoTime.csv — A single-column table (cells x "PseudoTime") of inferred pseudotime values
refNetwork.csv — A two-column table of ground-truth interactions between TFs ("Gene1") and target genes ("Gene2")
TranscriptionFactors.csv (REQUIRED FOR INFERENCE) — A list of known transcription factors and co-factors in the dataset
splitLabels.csv (REQUIRED FOR VALIDATION) — A single-column table (tfs x "Split") of training and validation folds for TFs in the refNetwork

For more help, see the example-data directory¹

One Additional Example

Train a new VGG-6 model on datasets with fully-known ground-truth interactions:

python RunDELAY.py [datadir] [outdir] --train -k [val_fold] \
         --model_type vgg -cfg 32 32 M 64 64 M 128 128 M

Read the peer-reviewed paper: https://doi.org/10.1093/pnasnexus/pgad113

Example data taken from Hayashi et al., Nature Communications (2018) ↩

Name		Name	Last commit message	Last commit date
Latest commit History 480 Commits
Checkpoints		Checkpoints
DELAY		DELAY
Networks		Networks
example-data		example-data
.gitignore		.gitignore
DELAY.png		DELAY.png
LICENSE.md		LICENSE.md
README.md		README.md
RunDELAY.py		RunDELAY.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checkpoints

Checkpoints

DELAY

DELAY

Networks

Networks

example-data

example-data

.gitignore

.gitignore

DELAY.png

DELAY.png

LICENSE.md

LICENSE.md

README.md

README.md

RunDELAY.py

RunDELAY.py

requirements.txt

requirements.txt

Repository files navigation

DELAY: DEpicting LAgged causalitY across single-cell trajectories for accurate gene-regulatory inference

Quick Setup

Two Steps to Infer Gene-Regulatory Networks

1. Fine-tune DELAY on datasets with partially-known ground-truth interactions, e.g. from ChIP-seq experiments:

2. Predict gene regulation across all TF-target gene pairs using the fine-tuned model:

Required Input Files for Single-Cell Datasets

One Additional Example

Train a new VGG-6 model on datasets with fully-known ground-truth interactions:

Read the peer-reviewed paper: https://doi.org/10.1093/pnasnexus/pgad113

About

Releases 1

Packages

Languages

License

calebclayreagor/DELAY

Folders and files

Latest commit

History

Repository files navigation

DELAY: DEpicting LAgged causalitY across single-cell trajectories for accurate gene-regulatory inference

Quick Setup

Two Steps to Infer Gene-Regulatory Networks

1. Fine-tune DELAY on datasets with partially-known ground-truth interactions, e.g. from ChIP-seq experiments:

2. Predict gene regulation across all TF-target gene pairs using the fine-tuned model:

Required Input Files for Single-Cell Datasets

One Additional Example

Train a new VGG-6 model on datasets with fully-known ground-truth interactions:

Read the peer-reviewed paper: https://doi.org/10.1093/pnasnexus/pgad113

Footnotes

About

Topics

Resources

License

Stars

Watchers

Forks

Languages