Skip to content

FINSURF is a tool designed to analyse lists of sequences variants in the human genome.

Notifications You must be signed in to change notification settings

DyogenIBENS/FINSURF

Repository files navigation

alt text

Introduction

FINSURF (Functional Identification of Non-coding Sequences Using Random Forests) is a tool designed to analyse lists of sequences variants in the human genome.

It assigns a score to each variant, reflecting its functional importance and therefore its likelihood to disrupt the physiology of its carrier. FINSURF scores Single Nucleotide Variants (SNV), insertions and deletions. Among SNVs, transitions and transversions are treated separately. Insertions are characterised by a score given to each base flanking the insertion point. Deletions are characterised by a score at every deleted base. FINSURF can (optionally) use a list of known or suspected disease genes, in order to restrict results to variants overlapping cis-regulatory elements linked to these genes.

For a variant of interest, users can generate a graphical representation of "feature contributions », showing the relative contributions of genomic, functional or evolutionary information to its score.

FINSURF is implemented as python3 scripts.

License

This code may be freely distributed and modified under the terms of the GNU General Public License version 3 (GPL v3) and the CeCILL licence version 2 of the CNRS. These licences are contained in the files:

  1. LICENSE-GPL.txt (or on www.gnu.org)
  2. LICENCE-CeCILL.txt (or on www.cecill.info)

Copyright for this code is held by the Dyogen (DYnamic and Organisation of GENomes) team of the Institut de Biologie de l'Ecole Normale Supérieure (IBENS) 46 rue d'Ulm Paris and the individual authors.

  • Copyright © 2020 IBENS/Dyogen : Lambert MOYON, Alexandra LOUIS, Thi Thuy Nga NGUYEN, Camille Berthelot and Hugues ROEST CROLLIUS

Contact

Email finsurf {at} bio {dot} ens {dot} psl {dot} eu

If you use FINSURF, please cite:

Classification of non-coding variants with high pathogenic impact. Lambert Moyon, Camille Berthelot, Alexandra Louis, Nga Thi Thuy Nguyen, Hugues Roest Crollius PLoS Genet. 2022 Apr 29;18(4):e1010191. doi: 10.1371/journal.pgen.1010191.

Quick start

Below is a quick start guide to using FINSURF

Table of content

Installation

Installing conda

The Miniconda3 package management system manages all FINSURF dependencies, including python packages and other software.

To install Miniconda3:

  • Download Miniconda3 installer for your system here

  • Run the installation script: bash Miniconda3-latest-Linux-x86_64.sh or bash Miniconda3-latest-MacOSX-x86_64.sh, and accept the defaults

  • Open a new terminal, run conda update conda and press y to confirm updates

Installing FINSURF

  • Clone the repository and go to FINSURF root folder

    git clone https://github.com/DyogenIBENS/FINSURF.git
    cd FINSURF
    
  • Create the main conda environment.

    We recommend using Mamba for a faster installation:

    conda install -c conda-forge mamba
    mamba env create -f envs/finsurf.yaml
    

    Alternatively, you can use conda directly :

    conda env create -f env/finsurf.yaml
    
  • Download feature contributions and gene associations.

    You have to download the data files (4.8 Go for intersect and 82Go for features contribution) that have to be intersect with your variants on https://www.opendata.bio.ens.psl.eu/finsurf/

    wget --no-check-certificate https://www.opendata.bio.ens.psl.eu/finsurf/finsurf_dataV1.tgz
    
    tar -xzvf finsurf_dataV1.tgz
    
    wget --no-check-certificate https://www.opendata.bio.ens.psl.eu/finsurf/plot_contribution_dataV1.tgz
    
    tar -xzvf plot_contribution_dataV1.tgz
    
    

    the architecture of the finsurf directory should then be:

  • FINSURF

    • LICENSE.txt
    • README.md
    • env
    • scripts
    • static
      • data
        • 2020-05-11_table_genes_FINSURF_regions.tsv
        • FINSURF_REGULATORY_REGIONS_GENES.bed.gz
        • FINSURF_REGULATORY_REGIONS_GENES.bed.gz.tbi
        • FINSURF_model_objects
          • full-model_woTargs_columns.txt
          • rename_columns_model.tsv
        • FULL_FC_transition.tsv.gz
        • FULL_FC_transition.tsv.gz.tbi
        • FULL_FC_transversion.tsv.gz
        • FULL_FC_transversion.tsv.gz.tbi
        • NUM_FEATURES.tsv.gz
        • NUM_FEATURES.tsv.gz.tbi
        • SCALED_NUM_FEATURES.tsv.gz
        • SCALED_NUM_FEATURES.tsv.gz.tbi
        • scores_all_chroms_1e-4.tsv.gz
        • scores_all_chroms_1e-4.tsv.gz.tbi
      • samples

Usage

Setting up your working environment for FINSURF

Before any FINSURF run, you should:

  • go to FINSURF root folder,
  • activate the conda environment with conda activate finsurf.

Running FINSURF on example data

Before using FINSURF on your data, we recommend running a test with our example data to ensure that installation was successful and to get familiar with the pipeline, inputs and outputs.

Example 1: Simple FINSURF run

To run FINSURF on example data:

python scripts/finsurf.py -i static/data/samples/variant.vcf -s static/data/scores_all_chroms_1e-4.tsv.gz -g static/data/FINSURF_REGULATORY_REGIONS_GENES.bed.gz -ig static/data/samples/gene.txt

The following output should be generated: res/result_*.txt.

To run FINSURF on the 49 variants from Genomizer:

python scripts/finsurf.py -i static/data/samples/Genomizer_49_var.vcf -s static/data/scores_all_chroms_1e-4.tsv.gz -g static/data/FINSURF_REGULATORY_REGIONS_GENES.bed.gz -ig static/data/samples/Genomizer_49_var_GENES.tsv

to plot the contributions for one specific variant:

python scripts/plot_contribution.py --variant "chr1:12005" --vartype "transition" --rename_cols_table static/data/FINSURF_model_objects/rename_columns_model.tsv --numFeat_path static/data/NUM_FEATURES.tsv.gz --scaled_numFeat_path static/data/SCALED_NUM_FEATURES.tsv.gz --featCont_transition_path static/data/FULL_FC_transition.tsv.gz --featCont_transversion_path static/data/FULL_FC_transversion.tsv.gz

to plot the contributions for one specific variant from Genomizer dataset:

python scripts/plot_contribution.py --variant "chr8:21988220" --vartype "transition" --rename_cols_table static/data/FINSURF_model_objects/rename_columns_model.tsv --numFeat_path static/data/NUM_FEATURES.tsv.gz --scaled_numFeat_path static/data/SCALED_NUM_FEATURES.tsv.gz --featCont_transition_path static/data/FULL_FC_transition.tsv.gz --featCont_transversion_path static/data/FULL_FC_transversion.tsv.gz

The script should generate the html file in res directory such as this one

About

FINSURF is a tool designed to analyse lists of sequences variants in the human genome.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages