Skip to content

Releases: jlumpe/gambit

v1.0.1

13 Mar 06:01
Compare
Choose a tag to compare
  • Significant documentation updates.
  • Better error reporting:
    • When database files cannot be found (in CLI and API).
    • On attempting to open an invalid signatures file.
  • Misc
    • Run tests on Python 3.11 and 3.12.
    • Minor changes to output of gambit signatures info.

v1.0.0

07 Oct 03:39
Compare
Choose a tag to compare

New features

  • tree command for generating hierarchical clustering trees from distance matrices.

General

  • Preferred extensions for genome database files and signatures files have been changed from .db and .h5 to .gdb and .gs.

Performance improvements

  • Use process-based parallelism by default for parsing multiple sequence files (much faster).
  • Speed up gambit dist with -s option applied.

CLI

  • Strip directory and extension from input file IDs. This applies to CSV output for querying distance calculation and IDs in generated signature files.
  • -k and --prefix parameters now default to values used RefSeq database.
  • Add option to specify number of cores to use.
  • Add option to disable progress bar printing.

v0.5.1

22 Aug 02:44
Compare
Choose a tag to compare

Minor edits to project README and metadata.

v0.5.0

18 May 19:50
Compare
Choose a tag to compare

New features

  • gambit dist command for calculating distance matrices.

CLI

  • Sequence file input
    • Explicitly restrict input to FASTA format only.
    • Files may be gzipped.
    • Read input file lists from text files.
  • Minor changes to options of subcommands in signatures group.

API

  • gambit.db subpackage:
    • Database-loading funcs moved to class methods of ReferenceDatabase.
    • Additional taxonomy tree methods.
    • Some additional internal reorganization/refactoring.

v0.4.0

19 Feb 23:34
Compare
Choose a tag to compare

Changes from 0.3.0:

New features

  • Result reporting
    • Results include list of closest reference genomes. This is only reported in JSON-based
      output formats.
    • New "next_taxon" attribute, indicating the next most specific taxon for which the
      threshold was not met.

CLI

  • signatures info subcommand uses current reference DB by default.

Documentation

  • Some improvements to API docs.

API and internals

  • calc_signature() function can take multiple sequences as input.
  • Remove calc_signature_parse() function.
  • Refactoring
    • Rename GAMBITDatabase -> ReferenceDatabase, gambit.db.gambitdb -> .refdb
    • Rename gambit.signatures -> gambit.sigs.
    • Merge gambit.sigs.array, gambit.sigs.meta -> gambit.sigs.base
    • Rename gambit.io.export -> gambit.results
    • Move generic sequence code from gambit.kmers to gambit.seq.
    • Merge gambit.io.seq -> gambit.seq.
    • Rename load_database* funcs -> load_db*.
    • Move gambit.io.json -> gambit.util.json, gambit.io.util -> gambit.util.io,
      remove gambit.io.
    • Moved some other stuff between modules.
  • Improvements to gambit.sigs.hdf5.HDF5Signatures
    • Improvements to .create() method.
    • Support compression.
  • Format-independent functions for reading/writing signature data.
  • jaccarddist_pairwise() function.
  • Add more tree-based methods to Taxon.
  • gambit.metric changes
    • jaccarddist_array and jaccarddist_matrix functions now accept any sequence type (e.g.
      list) for the refs argument, but with diminished performance.

0.4.0b1

10 Jan 02:58
Compare
Choose a tag to compare
0.4.0b1 Pre-release
Pre-release
v0.4.0b1

Update version to 0.4.0b1

v0.3.0

24 Sep 05:51
Compare
Choose a tag to compare

Changes from v0.2.2:

  • CLI updates
    • gambit query now accepts query signatures from a signature file.
    • New command group gambit signatures with info and create subcommands.
    • New debug command group (hidden).
  • Performance enhancements
    • Signature calculation for multiple sequence files can be run in parallel.
    • Signature calculation with large k much faster.
    • Benchmarks for signature calculation.
  • Documentation
    • Installation instructions
    • More complete CLI docs
  • API and internals
    • Major refactor to gambit.kmers and gambit.signatures
      • find_kmers() renamed to calc_signature() and moved to gambit.signatures.calc, related
        functions also renamed and moved.
      • Refactored k-mer search into new find_kmers() function, which finds locations of prefix
        matches in sequence.
      • Several other classes and functions moved from gambit.kmers to gambit.signatures submodules.
      • Rearrangement of stuff within gambit.signatures.
      • Added required kmerspec attribute to AbstractKmerArray.
      • Renamed some KmerSpec attributes
      • Rename gambit.kmers.reverse_complement() -> revcomp()
    • Refactor of Jaccard functions
      • Removed _sparse from function names
      • Array and matrix functions now calculate distance only, renamed from jaccard_* to jaccarddist_*
    • New features
      • Most functions which take DNA sequences now accept str, bytes, or Bio.Seq.Seq.
      • Convert signatures between compatible KmerSpecs.
      • HDF5Signatures close() method and context manager.
    • Other
      • Updated Cython kmers code.
      • Many updates/improvements to tests.

v0.2.2

25 Aug 02:02
Compare
Choose a tag to compare

Changes from v0.2.1:

  • Replace testdb_210126 with testdb_210818. Small enough to include all files, including reference signatures and query sequences, in version control.
  • Store pre-calculated query results for tests.
  • Some other minor test improvements and bug fixes.

v0.2.1

18 Aug 00:08
Compare
Choose a tag to compare

Changes from v0.2.0:

  • Added license
  • Add setuptools to runtime dependencies
  • Minor docstring edits

v0.2.0

08 Aug 17:04
Compare
Choose a tag to compare

Changes from 0.1.0:

  • User-facing
    • Rework JSON results format to be simpler and hide internal details
    • Add CSV results format (default)
    • Display progress while querying
    • Increase query performance and decrease memory usage
  • Internal
    • Major redesign of GAMBITDatabase and query funcs
      • GAMBITDatabase stores indices of reference signatures instead of loading them all up front
      • Read reference signatures in chunks when calculating distance matrix
      • Maintain reference to SQLAlchemy Session object on GAMBITDatabase.
    • strict classification parameter
      • Enables new behavior of finding and reconciling all matching taxa
      • Defaults to off, which results old behavior of using only closest match
    • Fix bug in consensus_taxon() and add tests
    • Flexible, generic progress monitoring API
      • Add to long-running functions like querying, distance matrix calculation, and k-mer finding
    • "archive" export format for saving full result data.
    • Lots of test improvements
    • More type annotations
    • Update API docs