This version of the Variant-Visualizer library was submitted as part of the Master Thesis of Julian Thorwarth on Jan. 12th, 2024.
https://github.com/gruber-sciencelab/Variant-Visualizer
https://github.com/jtwrt/VARIANT_VISUALIZER
A second repository was submitted for the identification of frequently mutated cis-regulatory elements:
https://github.com/gruber-sciencelab/Hotspot-Identification
This package allows to visualize genomic regions, transcripts and proteins, their known functional regions, regulatory elements and the mutations affecting them in interactive plots.
The setup of this package generates Clusters. A Cluster included genomic features which are overlapping or sufficiently close to each other, and additionally includes all relevant somatic variants and cis-regulatory elements currently supported by the package.
Once setup, the generation of figures is fast and allows interactive exploration on the genomic, transcript or protein level.
Copy the provided base_config.yml
:
cp base_config.yml config.yml
At a minimum, the following values need to be defined in the config.yml
:
init_bedtools
: Bash command that enables use of thebedtools
command in the command line.ucsc_liftover
: Path of the UCSC-Liftover executable.
Create conda enviroment:
conda env create -f conda_environment.yml
Activate the enviroment:
conda activate vavis
Run setup_dependencies.py
. The script automatically downloads all dependencies not mentioned in the previous step and prepares them for further use.
python setup_dependencies.py
Generate genomic clusters by running setup_clusters.py
passing the number of parallel processes as argument.
Generating all clusters is currently very resource intensive. For each parallel process, 20 GB of memory are advised. Use as many parallel processes as you can afford. This will allow you to visualize any gene/transcript/protein.
python setup_clusters.py --n_processes 1
If you wish to prepare the package for plotting a set of specific genes and you are using the default input files that were automatically prepared in the previous step, you can use a pre-generated index and query genes and transcript to find out which clusters you need to set up.
import variant_visualizer as vv
index = vv.clusters.load_index('pre-generated')
pten_cluster = index.query_gene_name('PTEN')
print(pten_cluster) # 3175
This command only calculates the cluster which includes the gene PTEN, allowing you to reproduce examples shown in vignette.ipynb
:
python setup_clusters.py --n_processes 1 --cluster_ids 3175
If you have generated clusters and you do not wish to rely on the pre-generated index, run setup_index.py
to generate an index including all clusters that were generated in the previous step. Re-run the script to update the index if new clusters were generated.
python setup_index.py
Read vignette.ipynb
for exampels and detailed explaination on how this package can be used.
For more advanced examples, take a look at examples.ipynb
.
thesis_figures.ipynb
was used specifically for the generation of figures in the submitted thesis.