Reported Statistics

qa

bin id: unique identifier of genome bin (derived from input fasta file)
marker lineage: indicates the taxonomic rank of the lineage-specific marker set used to estimated genome completeness, contamination, and strain heterogeneity. More detailed information about the placement of a genome within the reference genome tree can be obtained with the tree_qa command. The UID indicates the branch within the reference tree used to infer the marker set applied to estimate the bins quality.
# genomes: number of reference genomes used to infer the lineage-specific marker set
markers: number of marker genes within the inferred lineage-specific marker set
marker sets: number of co-located marker sets within the inferred lineage-specific marker set
0-5+: number of times each marker gene is identified
completeness: estimated completeness of genome as determined from the presence/absence of marker genes and the expected collocalization of these genes (see Methods in the PeerJ preprint for details)
contamination: estimated contamination of genome as determined by the presence of multi-copy marker genes and the expected collocalization of these genes (see Methods in the PeerJ preprint for details)
strain heterogeneity: estimated strain heterogeneity as determined from the number of multi-copy marker pairs which exceed a specified amino acid identity threshold (default = 90%). High strain heterogeneity suggests the majority of reported contamination is from one or more closely related organisms (i.e. potentially the same species), while low strain heterogeneity suggests the majority of contamination is from more phylogenetically diverse sources (see Methods in the CheckM manuscript for more details).
genome size: number of nucleotides (including unknowns specified by N's) in the genome
# ambiguous bases: number of ambiguous (N's) bases in the genome
# scaffolds: number of scaffolds within the genome
# contigs: number of contigs within the genome as determined by splitting scaffolds at any position consisting of more than 10 consecutive ambiguous bases
N50 (scaffolds): N50 statistics as calculated over all scaffolds
N50 (contigs): N50 statistics as calculated over all contigs
longest scaffold: the longest scaffold within the genome
longest contig: the longest contig within the genome
GC: number of G/C nucleotides relative to all A,C,G, and T nucleotides in the genome
coding density: the number of nucleotides within a coding sequence (CDS) relative to all nucleotides in the genome
translation table: indicates which genetic code was used to translate nucleotides into amino acids
# predicted genes: number of predicted coding sequences (CDS) within the genome as determined using Prodigal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reported Statistics

qa

Clone this wiki locally