Skip to content

robotoD/GenoVi

Repository files navigation

GenoVi: Genome Visualizer Software

GenoVi generates circular genome representations for complete, draft, and multiple bacterial and archaeal genomes. GenoVi pipeline combines several python scripts to automatically generate all needed files for Circos to generate circular plots, including customisable options for colour palettes, fonts, font format, background colour and scaling options for genomes comprising more than 1 replicon. Optionally, GenoVi built-in workflow integrates DeepNOG to annotate COG categories using alignment-free methods with user-defined thresholds, creating COG categories histograms and COG distribution plots per genome, contig or replicon, useful for further analyses.

Diagram

Installation

GenoVi dependencies can be installed creating the following bioconda environment

conda create -n genovi python=3.7 circos 

Activate the environment

conda activate genovi

GenoVi can then be installed using pip

pip install genovi

Dependencies

  • Circos 0.69-8
  • Python 3.7 or later
  • DeepNog 1.2.3
  • NumPy 1.20.2
  • MatPlotLib 3.5.2
  • Pandas 1.2.4
  • Biopython 1.79
  • CairoSVG 2.5.2
  • Seaborn 0.12
  • Perl 5
  • List::MoreUtils (Perl library)
  • Natsort 8.2.0

Usage

genovi [-h] [options ..] -i input_file -s status

Main arguments

  • -i, --input_file. GenBank input file path.
  • -o, --output_file. Output file name. Default: genovi.
  • -s, --status. “complete” or “draft”. Complete genomes are drawn as separate circles for each contig/replicon.

Information:

  • -h, --help. Shows this help message and exit.
  • --version. Shows the currently installed version of genovi.

COGs:

  • -cu, --cogs_unclassified. Do not classify each coding sequence into Clusters of Orthologous Groups of proteins (COGs).
  • --cogs, COGS To specify which COG categories include in the circular representation. For example 'ABJKLX'
  • -b, --deepnog_confidence_threshold. DeepNOG confidence threshold range [0,1] Default: 0. If provided, predictions below the threshold are discarded.

Format:

  • -a, --alignment. When a --status complete is specified, this flag defines the alignment of each individual contig. Options: center, top, bottom, A (First on top), < (first to the left), U (Two on top, the rest below). By default, this is defined by contig sizes.
  • --scale. When using --status complete, whether to use a different scale format to ensure visibility. Options: variable, linear, sqrt. Default: sqrt.
  • -k, --keep_temporary_files. Keep temporary files.
  • -r, -reuse_predictions. If available, reuse DeepNog prediction result from the previous run. Useful only if --keep_temporary_files flag is enabled.
  • -w, --window. Window size (base pair) to assign a GC analysis. Default: 5000.
  • -v, --verbose. Verbose or in-console log messages activated.

Text:

  • -c, --captions_not_included. Do not include captions in the figure.
  • -cp, --captions_position. Captions position. Options: left, right, auto.
  • -t, --title. Figure title.
  • --title_position. Title position. Options: center, top, bottom.
  • --italic_words. How many title words should be written in italic. Default: 2.
  • --size. Displays the genome size of each independent circular representation.
  • -te, --tracks_explain. Adds a space break in the circular representation, including captions for each track within the ideogram.

Colours:

  • -cs, --colour_scheme. Prebuilt color scheme to use for CDS, RNAs, and GC analysis. Options: strong,autumn,dawn,blossom,paradise,neutral, blue, purple, soil, grayscale, velvet, pastel, ocean, wood, beach, desert, ice, island, forest, toxic, fire, spring.
  • -bc, --background. Background colour, in R, G, B format. Default: transparent.
  • -fc, --font_colour. Font color. Default: black.
  • -pc, --CDS_positive_colour. Colour for positive CDSs, in R, G, B format. Default: '180, 205, 222'.
  • -nc, --CDS_negative_colour. Colour for negative CDSs, in R, G, B format. Default: '53, 176, 42'.
  • -tc, --tRNA_colour. Colour for tRNAs, in R, G, B format. Default: '150, 5, 50'.
  • -rc, --rRNA_colour. Colour for rRNAs, in R, G, B format. Default: '150, 150, 50'.
  • -cc, --GC_content_colour. Colour for GC content, in R, G, B format. Default: '23, 0, 115'.
  • -sc, --GC_skew_colour. Colour scheme for positive and negative GC skew. A pair of RGB colors. Default: '140, 150, 198 - 158, 188, 218'.
  • -sl, --GC_skew_line_colour. Colour for GC skew line. Default: black.

More detailed information about the arguments can be found in the user guide.

Tutorials

Check the tutorials in the user guide tutorials.

Output files

Resulting images are saved in a folder called [name] as [name].svg and [name].png (name being specified with output_file argument or, by default, genovi. In case of a complete genome, individual contig image files are stored in a [name] subdirectory as [name]-contig_[i].png with i in [1, the number of circles]. In the case of draft genomes, GenoVi displays the replicons as delivered by the initial GenBank file.

Besides images, if -k or --keep_temporary_files was called, files described in user guide arguments will also be stored.

Four additional files are stored in [name] folder: a histogram displaying COG categories named [name]_COG_histogram.png; a file with the COG classification of each replicon named [name]_COG_Classification.csv; a csv file named [name]_Gral_Stats.csv displaying general information of each replicon, including size, GC content, number of CDS, tRNA and rRNA; and a heatmap displaying the distribution of COGs within each replicon [name]_COG_Classification.csv_percentage

Additional information

For further information, please read the user guide.

Citation

If you use GenoVi in your research, please cite our latest paper

Cumsille A, Durán RE, Rodríguez-Delherbe A, Saona-Urmeneta V, Cámara B, Seeger M, et al. GenoVi, an open-source automated circular genome visualizer for bacteria and archaea. PLoS Comput Biol. 2023;19:e1010998.

GenoVi is under a BY-NC-SA Creative Commons License, Please cite.

You may remix, tweak, and build upon this work even for commercial purposes, as long as you credit this work and license your new creations under the identical terms.