Skip to content

External Gene Catalog

Sam Minot edited this page Mar 2, 2021 · 1 revision

If a user wishes to analyze a dataset using a set of gene sequences which have been generated by some other method, or from some other dataset, they can do so using the --gene_fasta flag.

The file indicated by this flag must be gzip-compressed and in FASTA format, with each record in the FASTA being a unique amino acid (protein) sequence.

Even with the external gene catalog specified in this way, de novo assembly will still be carried out by geneshot. The reason for that behavior is that the co-assembly of genes on the same contig is used as information to speed up and optimize the CAG-creation process. While this process is computationally slow, the quality of those results is a large part of the utility of the geneshot pipeline as a whole.