RENOVO

What it does?

ReNOVo is a machine learning based software that classifies variants as pathogenic or benign based on publicly available information and provides a Pathogenicity Likelihood Score (PLS).

“Files” folder

median_correct.xlsx: table with substitutions to perform NA imputation
RF_model.pkl: trained model of Random Forest
variables.txt: the set of variables that are used to run the small RF
ordered_cols.txt: the total set of variables after one-hot-encoding: each level of variable “Type” is considered. Columns are in the correct order to perform new predictions with RF
columns.txt: file cotaining the columns of interest of the training set, use this file as "-c" argument for rf_trainer.py

“Scripts” folder

preprocessing.R This file is the first one of the pipeline and performs preprocessing steps such as NA imputation, columns renaming and new variable creation: this is done by calling the function FixData_median.R it takes as input “input_file”, which is the file coming from Annovar annotation and the excel table with medians “median_correct.xlsx”. Its output is the file “input_RF.tab”, which will be the input of the Random Forest.
FixData_median.R This is the function which actually performs all the steps above. It takes as input an intermediate dataframe from preprocessing.R and median_correct.xlsx
Renovo_implementation.py In this file predictions with RF are performed. The model is already trained, some final preprocessing is done, such as elimination of useless columns and reordering of the useful ones. RF is run and the columns with prediction and with score are saved. It takes as input the files “input_RF.tab” (coming from preprocessing.R), the files with column names “variables.txt” and “ordered_cols.txt”, the file with parameters of the trained model: “RF_model.pkl”. The output is “output_RF”, that is the input_RF with added the columns with RF prediction and score. NOTE: this output_RF has the NA imputed, IT IS NOT the original one.
renovo_optim.py This script create the random forest model using a training set and a file containing the column of interest

Usage

command:

./ReNOVo.py -a path/to/annovar -p path/to/VCFs/AVinputs

help message:

ReNOVo.py (version 1.0)

usage: ReNOVo.py [-h] -p, --path PATH -a, --annovar ANNOVAR

Given as input a folder containing the VCF or annovar input (AVinput) files,
this program applies the Random Forest model of ReNOVo and returns the tabular
annovar like files with the classification provided by the model itself.

optional arguments:
  -h, --help         show this help message and exit
  -p, --path PATH        the path to VCFs/AVinputs directory
  -a, --annovar ANNOVAR  the path to ANNOVAR directory

Requirements and Set-up:

conda env creation

conda env create -f ReNOVo.yml

package to install in the conda env (commands):

conda install -c r r-curl r-httr r-rvest r-readxl r-tidyverse
conda install -c bioconda r-openxlsx

python:

python -m pip install scikit-learn==0.20.3
pip install pandas
pip install matplotlib
pip install seaborn
pip install argparse

Troubleshooting: If the R packages are not working properly, try to install them via Rscript as shown below. The "tidyverse" package may generate errors with library versions if it does remove the old packages and reinstall them.

R:

Rscript -e "install.packages(c('openxlsx','tidyverse','readxl'), repos='http://cran.us.r-project.org')"

Remember to change the interpreter (python/Rscript) in these scripts: ReNOVo.py, preprocessing.R, Renovo_implementation.py

Web Server

https://bioserver.ieo.it/shiny/app/renovo

Data storage

Training set https://drive.google.com/file/d/13G6Dn-YzZpS6PK-bhIu_fS1eWR-sQW5T/view?usp=sharing
Test set https://drive.google.com/file/d/1E-GcaOw_ED87Y0Zgsgg4vMgIIX4YsjL4/view?usp=sharing

License

ReNOVo is free non-commercial software. Users need to obtain the ANNOVAR licence by themselves. Contact the Authors for commercial use.

Reference

Valentina Favalli, Giulia Tini, Emanuele Bonetti, Gianluca Vozza, Alessandro Guida, Sara Gandini, Pier Giuseppe Pelicci, Luca Mazzarella, Machine learning-based reclassification of germline variants of unknown significance: The RENOVO algorithm, The American Journal of Human Genetics, 2021, , ISSN 0002-9297, https://doi.org/10.1016/j.ajhg.2021.03.010. (https://www.sciencedirect.com/science/article/pii/S000292972100094X)

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Files		Files
Scripts		Scripts
Dockerfile		Dockerfile
README.md		README.md
ReNOVo.py		ReNOVo.py
ReNOVo.yml		ReNOVo.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

Files

Scripts

Scripts

Dockerfile

Dockerfile

README.md

README.md

ReNOVo.py

ReNOVo.py

ReNOVo.yml

ReNOVo.yml

Repository files navigation

RENOVO

What it does?

“Files” folder

“Scripts” folder

Usage

Requirements and Set-up:

Web Server

Data storage

License

Reference

About

Releases 1

Packages

Contributors 2

Languages

mazzalab-ieo/renovo

Folders and files

Latest commit

History

Repository files navigation

RENOVO

What it does?

“Files” folder

“Scripts” folder

Usage

Requirements and Set-up:

Web Server

Data storage

License

Reference

About

Topics

Resources

Stars

Watchers

Forks

Languages