This is a collection of tools that are useful for comparing metagenomic datasets to each other and to references. They were also used in the Transmission of crAssphage paper (reference once it's available).
- Compare many metagenomic datasets (reads) to a single reference for SNP calling, multiallelic site identification
- Pairwise comparison of metagenomic assemblies of an microbe
- Compare a single metagenomic sample to a collection of references to identify what strain it's closest to
comparative_metagenomics/many_vs_one_snippy.snakefile
This takes as input a set of sequencing reads and a reference genome. SNPs are called against the reference using snippy. Variants are filtered for high quality. To compare many samples against each other, variants are normalized and decomposed using vt.
A heatmap of pairwise SNP similarity between samples is generated at the end.
comparative_metagenomics/compare_assembled_contigs.snakefile
This pipeline takes as input a set of metagenomic assemblies, filters for contigs >500bp in length, and aligns contigs to a reference genome. Contigs that align are then compared against each other, pairwise, using nucmer.