This repository contains code for SVEN, a multi-modality sequence-oriented in silico model, for quantifying genetic variants' regulatory impacts in over 350 tissues and cell lines.
The SVEN framework is described in the following manuscript: Yu Wang, Nan Liang and Ge Gao, Quantify genetic variants' regulatory potential via a hybrid sequence-oriented model, bioRxiv (2024).
Important Note: now we provide two modes for prediction: Full mode and Fast mode. For Full mode, you need download ~380G dependent model parameters files; while for Fast mode, you only need to download ~2G dependent model parameters files, with negligible precision loss. For reproducing results from our manuscript, please use Full mode.
Clone the repository then download and extract necessary resource files:
git clone https://github.com/gao-lab/SVEN.git
cd SVEN
# Download and extract resources and model parameters, default for fast mode
sh download_resources.sh
# for full mode
sh download_resources.sh -m full
Install python (3.8), install TensorFlow (v2.5.0) following instructions from https://www.tensorflow.org/ and bedtools from https://bedtools.readthedocs.io/. Use pip install -r requirements.txt
to install the other dependencies.
This is a quick guide for usage, the full guideline is coming soon.
# One-hot encoding
python prepare_data.py ./example/test.bed
# Get functional annotations with CPUs in fast mode
python get_annotations.py
# OR Get functional annotations with GPU 0 in fast mode
python get_annotations.py --gpu 0
# Transform annotations
python transform_annotations.py
# Predict gene expression
python predict_expression.py ./test.exp.predict.txt # with all models
python predict_expression.py ./test.exp.predict.txt --target_idx 3 # with target model
Yu Wang: wangy@mail.cbi.pku.edu.cn