Skip to content
amit yadav edited this page Nov 17, 2021 · 17 revisions

Typical Workflow

CheckM works on a directory of genome bins in FASTA format. By default, CheckM assumes genomes consist of contigs/scaffolds in nucleotide space and that the files to process end with the extension fna. You can specify a different extension with the –x flag. CheckM calls genes internally using prodigal, taking care to identify genes with recoded stop codons. You can call genes externally and provide CheckM with FASTA files containing genes in amino acid space. To specify this, use the --genes flag. Again, you may need to change the extension CheckM looks for (e.g., -x faa).

CheckM consists of a series of commands in order to support a number of different analyses and workflows. If you are in a rush to get started, the standard workflow for CheckM is:

> checkm lineage_wf <bin folder> <output folder>

For a full list of options, run checkm lineage_wf -h. To speed up processing, use the -t flag to specify the desired number of threads. If you are on a machine with <40 GB of memory, the --reduced_tree flag can be used which reduces the memory requirements to approximately 14 GB.

After performing a lineage or taxonomy (checkm taxonomy_wf) workflow, you can re-run the qa command to produce a number of different output tables and plots. The qa command requires a marker file to run. This file is produced during the workflow and is stored in the CheckM output directory specified (e.g. lineage.ms for lineage_wf or <taxon name>.ms for taxonomy_wf).

Example Usage

Assume you have putative genomes in the directory /home/donovan/bins with fa as the file extension and want to store the CheckM results in /home/donovan/checkm. To processes these genomes with 8 threads, simply run:

> checkm lineage_wf -t 8 -x fa /home/donovan/bins /home/donovan/checkm

Or, to process files of called genes in amino acid space which have the extension faa, use:

> checkm lineage_wf --genes -t 8 -x faa <bin folder> <output folder>