Skip to content
Francisco Ascue Orosco edited this page Mar 27, 2023 · 79 revisions

System requirements

CheckM is designed to run on Linux. The limiting requirement for CheckM is memory. Inference of lineage-specific marker sets using the full reference genome tree required approximately 40 GB of memory. However, a reduced genome tree (--reduced_tree) can also be used to infer lineage-specific marker sets which is suitable for machines with as little as 16 GB of memory. We recommend using the full tree if possible, though our results suggest that the same lineage-specific marker set will be selected for the vast majority of genomes regardless of the underlying reference tree. System requirements are far more modest if you plan to make use of taxonomic-specific marker sets or your own custom marker genes as this bypasses the need to place genomes in the reference genome tree.

If you plan to process a large number of genomes, you may wish to break these into smaller batches. On a 64GB machine running a 1000 genomes at a time with 40 threads works well. Exceeding the available memory of your machine will cause CheckM to use swap space (as per any program) which will substantially increase the time to process genomes.

How to install CheckM

Bioinformatic tool dependencies

CheckM requires the following programs to be added to your system path:

  • HMMER (>=3.1b1)
  • prodigal (2.60 or >=2.6.1)
    • executable must be named prodigal and not prodigal.linux
  • pplacer (>=1.1)
    • guppy, which is part of the pplacer package, must also be on your system path
    • pplacer binaries can be found on the pplacer GitHub page

Installation through pip

CheckM >=1.1.0 is a Python 3.x program and can be install through pip:

> pip3 install numpy
> pip3 install matplotlib
> pip3 install pysam
> pip3 install checkm-genome

This will install CheckM and all other required Python libraries. The bioinformatic tool dependencies need to be install separately and placed on your system path.

Installation through Conda

A CheckM Conda environment can also be setup as follows:

conda create -n checkm python=3.9
conda activate checkm
conda install -c bioconda numpy matplotlib pysam
conda install -c bioconda hmmer prodigal pplacer
pip3 install checkm-genome

A full Conda package for CheckM is also available here which has been generously put together and maintained by community members (if this is you please let me know so I can acknowledge you here!)

Required reference data

CheckM relies on a number of precalculated data files which can be downloaded from either:

The reference data must be decompress into a directory and the path to this data set using the CHECKM_DATA_PATH environmental variable, e.g.:

> export CHECKM_DATA_PATH=/path/to/my_checkm_data

Alternatively, the following command can be run to inform CheckM of where the files have been placed:

> checkm data setRoot <checkm_data_dir>

Note: CheckM defaults to the environmental variable CHECKM_DATA_PATH if it is set.

Running CheckM

CheckM is now ready to run. For a list of CheckM commands type:

> checkm

How to upgrade CheckM

You can upgrade CheckM through pip:

> pip3 install checkm-genome --upgrade --no-deps

The CheckM reference database is not expected to change until CheckM v2.

Unit tests

If you wish to test your installation, you can run CheckM's unit tests. This isn't necessary and is primarily meant for development purposes. However, some system administrators may find this useful. A general test of CheckM which will verify all 3rd party dependencies can be run using:

> checkm test ~/checkm_test_results

This runs the E.coli K12-W3310 genome through the standard CheckM pipeline and verifies the resulting output files. The output directory can be removed once the test has run.

Additional unit tests are provided in the test directory. These are designed to aid in development and make use of nose.

Web version through KBase

The CheckM lineage workflow is available at KBase for those looking for a web-based solution.