External Gene Catalog
Sam Minot edited this page Mar 2, 2021
·
1 revision
If a user wishes to analyze a dataset using a set of gene sequences which have been
generated by some other method, or from some other dataset, they can do so using the
--gene_fasta
flag.
The file indicated by this flag must be gzip-compressed and in FASTA format, with each record in the FASTA being a unique amino acid (protein) sequence.
Even with the external gene catalog specified in this way, de novo assembly will
still be carried out by geneshot
. The reason for that behavior is that the co-assembly
of genes on the same contig is used as information to speed up and optimize the
CAG-creation process. While this process is computationally slow, the quality of those
results is a large part of the utility of the geneshot
pipeline as a whole.
- Getting Started
- De novo vs. Reference-Based Analysis
- Running Geneshot
- Output Files
- Input File Format
- Nextflow Configuration
- Helpful Scripts:
- Concepts: