Skip to content
Cameron Gilchrist edited this page Dec 7, 2020 · 1 revision

Welcome to the clinker wiki!

clinker is a tool for visualising gene cluster similarity, (hopefully) replacing the need for you to manually create them from scratch in PowerPoint or Illustrator. The following guide should get you up and running with using the program and generating your own visualisations.

If you find clinker useful, please cite the pre-print:

Gilchrist, C.L.M., Chooi, Y.-H., 2020. clinker & clustermap.js: Automatic generation of gene cluster comparison figures. bioRxiv 2020.11.08.370650. https://doi.org/10.1101/2020.11.08.370650

Installation

Installing Python

clinker requires Python >3.5. If you do not have Python installed, you can go to https://www.python.org/ and download the appropriate installer for your operating system. On Windows, make sure you enable the option to put Python on your system PATH (usually a checkbox in the final page of the installer) so that you can access Python packages like clinker directly from the command line.

Installing clinker

(Optional) To avoid conflicts with other installed Python packages, it is recommended to install clinker within a virtual environment. To do this, first create a new virtual environment:

pip3 --method virtualenv my_env

Then activate it:

source my_env/bin/activate

Finally, install clinker:

pip install clinker

This will install clinker as well as all of its dependencies. If you have both Python 2 and 3 installed, you might have to specify pip3 instead of just pip, e.g.:

pip3 install clinker

Dependencies

clinker depends on the following Python packages to work:

  • BioPython (>=1.75): used when performing pairwise sequence alignments. clinker requires at least version 1.75, due to substitution matrices used in the sequence alignments being stored in a different location within the BioPython package.
  • SciPy (>=1.3.3) and NumPy (>=1.13.3): used when computing similarity scores of clusters and performing hierarchical clustering to determine the optimal display order. (Earlier versions should work fine, but are untested).

The most up-to-date versions of these packages are installed automatically when you install clinker. If you have older versions of these packages, they can be updated by providing the --force-reinstall argument to pip. For example:

pip3 install --force-reinstall clinker

Quick start

Basic pipeline

clinker takes GenBank files as input. These will typically just be a single locus (i.e. small region extracted from a larger genomic scaffold), however multi-record GenBank files are also supported, allowing you to visualise gene clusters that may be split over multiple loci (e.g. due to fragmented genome assembly).

The clinker pipeline can be run as simply as:

clinker file1.gbk file2.gbk file3.gbk -p

This will read in your GenBank files (file1.gbk, file2.gbk, file3.gbk), align them, cluster them to determine display order, and generate the full clustermap.js visualisation in your web browser.

By default, the visualisation is dynamically served and you will have to interrupt clinker (using Ctrl + C) to stop it. A static HTML document containing the visualisation can be generated instead by providing a file name to the -p/--plot argument:

clinker file1.gbk file2.gbk file3.gbk -p my_plot.html

Once the visualisation is loaded in the web browser, you can play around with the settings in the sidebar to change its appearance and layout. Once you're happy with the figure, you can save an SVG image by clicking the save button.

Session files

A clinker session can be saved/reloaded using the -s/--session argument to avoid having to recompute gene cluster alignments:

clinker file1.gbk file2.gbk file3.gbk -s alignments.json

This is particularly useful if you want to add more clusters to an alignment. If a session file is loaded alongside new GenBank files, clinker will add them to the session, only performing the necessary alignments with the new files. The session file is then re-written with the new alignments. For example:

clinker -s alignments.json file4.gbk file5.gbk

Input

clinker can be given either direct paths to input files or folders containing input files. For example, we could move our files 1–3 from the above examples into a folder and load them alongside 4–5 like so:

clinker input_folder/ file4.gbk file5.gbk -p

When given a folder, clinker will automatically look for all files within that folder; if folders are found inside the given folder, clinker will also look inside of those.

Another feature is the ability to use the order of input files instead of performing hierarchical clustering. This can be useful in situations where you would like to generate clinker visualisations matching the order of a matrix or phylogenetic tree without having to manually rearrange them within the visualisation. This can be done using the -ufo/--use_file_order flag:

clinker file3.gbk file1.gbk file2.gbk -ufo -p

If you have a long list of files, it is easier to create a text file containing the paths to each file in your desired order. For example, given a file containing:

file3.gbk
file2.gbk
file1.gbk

We could then use a little Bash scripting to load it:

clinker $(cat files.txt) -ufo -p

Alignment

clinker currently provides two options to change how alignments are performed. The first, -na/--no_align, will skip aligning altogether, reading in your GenBank files and generating the visualisation directly. Since no alignments are performed, clinker will not be able to colour the genes in your figure. However, this can be useful if to-scale cluster maps are all that is required.

The second, -i/--identity, is a threshold for sequence identity that must be met for a gene-gene alignment to be saved. By default, this is set to 0.3 (30%).

Output

By default, clinker reports all alignment summaries to the terminal in human-readable format. However, clinker can also easily generate delimited files. For example, to generate a comma-separated file (CSV) that can be imported into spreadsheet software, we can use the -dl/--delimiter argument:

clinker *.gbk -o alignments.csv -dl “,”

Note that the -o/--output argument can be used to save clinker output directly to a file. If the -f/--force flag is given, clinker will overwrite pre-existing output files.

clinker provides several other options to mutate this output: alignment column headers can be hidden using the flag -hl/--hide_link_headers; cluster names hidden using the flag -ha/--hide_aln_headers; and number of decimal places for score values set using -dc/--decimals argument.

Visualisation

The clustermap.js visualisation used by clinker is designed to be very easy to customise. An overview of usage, as well as all changeable options, is provided in the visualisation sidebar. Briefly:

  • Clusters can be rearranged vertically by dragging cluster names
  • Loci can be moved or resized by hovering over them and dragging the box
  • The visualisation can be anchored around a specific gene by clicking on it
  • Clusters and similarity groups can be renamed by clicking on their text
  • Similarity group colours can be changed by clicking on the circles in the legend
  • Groups can be removed by right-clicking their label in the legend
  • The scale bar can be resized by clicking its text and entering a new value (bp)

clinker provides numerous settings that can be changed to alter the layout and appearance of the visualisation. These are all listed inside the sidebar; any changes you make to these options will directly update the visualisation.