Skip to content

sourmash-bio/database-examples

Repository files navigation

Examples workflows for building sourmash databases

This repository contains examples, demonstrations, and support scripts for building custom sourmash databases, using the new sourmash sketch fromfile command and related additions to sourmash.

See sourmash#1671 for the overall discussion about building databases.

Examples

See an example of building a private database.

Another example: building protein and DNA databases starting from genomes.

Building a DNA+protein database from the NCBI genome assembly & proteome files.

Building a DNA+protein database from an NCBI genome assembly file.

Scripts and code

  • fasta-to-fromfile.py - build a fromfile CSV file from a list of FASTA files.
  • genbank-to-fromfile.py - build a fromfile CSV file from a list of FASTA files downloaded from Genbank
  • kiln.py - support library for building fromfile CSVs.
  • mass-rename.py - a script to bulk-rename sourmash signatures.
  • mass-merge.py - a script to bulk-merge sourmash signatures by spreadsheet column attribute.
  • sigs-to-manifest.py - a script to extract and/or update sourmash manifests from many databases.