Releases: sourmash-bio/sourmash
v4.2.2
Major new features:
- added functionality to recover original k-mers given hashes -
sourmash sig kmers
et al. (#1653, #1695, #1701)
Documentation updates:
Minor new features:
- Adjusted dayhoff and hp encodings to tolerate stop codons in the protein sequence (#1673)
Bug fixes and performance improvements:
- Fixed panic bug in
sourmash sketch
dna with bad input and--check-sequence
(#1702)
Refactoring and cleanup:
v4.2.1
This is a bug-fix and performance release of sourmash.
There are no major new features.
git log --oneline v4.2.0..latest
Minor new features:
- new picklist coltypes for directly using
gather
,prefetch
, andmanifest
outputs without specifying column name (#1660) - add
--from-file
tosig cat
(#1657) - implement a lazy/on-demand
Index
loading class to support low memory tracking of a large index (#1661) - add
sourmash tax prepare
to build SQLite taxonomy databases for use withtax
commands(#1651) - Support manifests in
MultiIndex
(#1654) tax
summarization additions and fixes, including reporting bp and unclassified (#1667)- add
--from-file
, improved sig selection to mostsig
commands (#1672)
Bug fixes and performance improvements:
- fix bug in
gather
when run withscaled=1
(#1670)
Documentation updates:
- Add sourmash-bio/community Gitter badge to README (#1658)
Refactoring and cleanup:
- add tests for
sourmash tax
--containment-threshold
arg (#1666) - fix
sourmash tax
usage string (#1655) - add bounds checking for
--scaled
(#1650)
Rust interface:
- Rust Core update (tag: r0.11.0) (#1643)
v4.2.0
This release adds several significant features: first, we've added a set of taxonomy
command-line functionality for combining sourmash gather
output with taxonomy databases, and we've also added a new "picklist" feature that enables flexible selection of subsets of databases. Finally, we've added manifests to databases to support picklists as well as faster database loading and signature selection.
As of this release, we've also formally moved development over to the sourmash-bio organization on GitHub, and we've created a new gitter support channel, sourmash-bio/community. Please join us there if you have any questions, comments, or feature requests!
Major new features:
- add
tax/taxonomy
submodule (#1543, #1628, #1630, #1648) - add picklists for subsetting databases and results (#1587, #1588, #1623, #1590, #1639)
- Add manifests to support fast
Index.select(...)
and lazy loading (#1590)
Documentation updates:
- Add new GTDB databases description to docs and start legacy databases page (#1581)
- Change
dib-lab/
URLs to newsourmash-bio/
URLs. (#1629) - Add notice for sustainable open source study (#1580)
Minor new features:
- alias
--nucleotide
,--no-nucleotide
for moltype args. (#1632) - add signature names to known/unknown hash sigs output by
sourmash prefetch
(#1646)
Bug fixes and performance improvements:
- Speed up
sourmash gather
with prefetch by ignoring unidentifiable hashes (#1613) - Check for
MinHash
compatibility inMinHash.intersection_and_union(...)
(#1627) - Fix selection w/abund and manifest column type conversions (#1645)
Refactoring and cleanup:
v4.1.2
This is a bug-fix and performance release of sourmash.
There are no major new features.
Minor new features:
- add query info to gather CSV output (#1565)
Bug fixes and performance improvements:
- Improved
MinHash.remove_many(...)
performance by five orders of magnitude (#1571) - Fix SBT index saving bug that arbitrarily replaced names (but not content) of identical signatures in
.sbt.zip
files (#1568) - Empty zipfiles should not cause
AssertionError
(#1546)
Major refactoring and new internal functionality:
- update
MinHash.set_abundances
to remove hash if 0 abund; handle negative abundances (#1575)
Refactoring and cleanup:
v4.1.1
This release fixes a minor bug, provides some refactorings, and dramatically decreases memory consumption for sourmash gather --linear
(which is, admittedly, a niche use case :).
No major new features.
Bug fixes and performance improvements:
- Unload data with
sourmash gather --linear
on SBTs (#1534) - Fix
sourmash gather --no-prefetch
when used w/abund signatures (#1528) - Fix
sourmash index
to not create directory for .sbt.zip output (#1539)
Major refactoring and new internal functionality:
- Add
FrozenMinHash
to better support separation of frozen and mutable data actions (#1508)
Refactoring and cleanup:
v4.1.0
4.1.0 release notes
This release provides several convenient features for users, including zipfile collections on input and output and a new prefetch
command. sourmash gather
has also received a considerable speed/memory upgrade (twice as fast, 80-90% lower memory). You should upgrade! As a reminder, v4.x has several incompatibilities with v3.x, and if you are upgrading from v3.x you should consult our migration guide.
Major new features:
- Support zipped collections of signatures (#1349)
- Refactor
gather
functionality for speed & modularity (#1370, #1512, #1513) - Provide new command,
prefetch
. (#1370) - Add flexible & iterative support for outputting signatures in variety of collection formats - directories, zipfiles, etc. (#1493)
- Add
max_containment
to API and--max-containment
to command line (#1346) - Add
--from-file
option tosourmash sketch
commands (#1362)
Bug fixes that break backwards compatibility:
- Require scaled signatures for containment (#1381)
- Fix CSV output for
sourmash lca classify
when.name
is empty (#1401) - Really old SBTs (pre-v2.0) no longer load (v1 and v2 SBTs) (changed in #1392)
Other bug fixes:
- Add proper newline output for csv module (#1319) - important for Windows!
Other new features:
--best-only
searches now work for both similarity AND containment (fixed in #1392)sourmash categorize
now takes all database types- add
--name
tosourmash sig merge
(#1480) - decline to load really large files for LCA databases if they're not valid JSON (#1495)
Major refactoring and new internal functionality:
- Add a
MultiIndex
class that wraps multipleIndex
classes (#1374) - Refactor and dramatically simplify database loading and compatibility checking (#1406, #1420)
- Rework the
find
functionality forIndex
classes (#1392, #1477). - Improved intersection and union calculations (#1475)
Documentation enhancements:
- Update the sourmash
__init__.py
docstring, provide__all__
for imports (#1364) - Add '-h/--help' usage instructions to 'sourmash sketch' CLI (#1400)
- Add ORCID to contribution checklist (#1405)
- Add information about updating the developer environment to the developer docs (#1432)
- Docs: Partial fix for doc build issues with notebooks (#1516)
Refactoring and cleanup:
- Refactor the database loading code in
sourmash_args
(#1373, #1380) - Pin needletail version to keep MSRV at 1.37 (#1393)
- Rename
load_file_list_of_signatures
toload_pathlist_from_file
(#1423) - Update call to notify in
src/sourmash/search.py
with f-strings (#1422) - Bump MSRV to 1.42 (and other dep fixes) (#1461)
- CI/Rust: update and fix cbindgen config (#1473)
- Refactor MinHash.downsample (#1458)
- Make
MinHash.downsample(...)
require keyword arguments & fix newly revealed buggy test. (#1448) - Add a check for LCA database error text in
tests/test_lca.py
(#1445) - pin docutils version to last working (#1444)
- add codecov configuration to fix paths (#1422, #1449)
- provide new test fixtures for cleaner testing (#1487)
- Fix small papercuts: SyntaxWarning and coverage reports (#1488)
- Clean up clippy lints from 1.52 (#1505)
- Bump docutils from 0.16 to 0.17.1 (#1499)
- Update myst-parser requirement from ~=0.13.7 to >=0.13.7,<0.15.0 (#1520)
- replace utils.TempDirectory with runtmp in some tests (#1502)
v4.0.0
Major changes for 4.0
4.0 is a major new version of sourmash, and it contains a number of new and breaking features.
Please see our migration guide for more information on how to migrate from v3.x to version 4.0!
Numerical output and search results are unchanged
There are no changes to numerical output or search results in this release; you should get the same results with v4 as you get with v3, except where command-line parameters need to be adjusted as noted below (see: protein ksize #1277, lca summarize
changes #1175, sourmash gather
on signatures without abundance #1328). Please file an issue if your results change!
New or changed behavior
- default SBT storage is now .sbt.zip (#1174, #1170)
- add
sourmash sketch
command for creating signatures (#1159) - protein ksizes in MinHash are now divided by 3, except in
sourmash compute
(#1277) - refactor MinHash API and implementation: add, iadd, merge, hashes, and max_hash (#1282, #1154, #1139, #1301)
- add HyperLogLog implementation (#1223)
SourmashSignature.name
is now a property (not a method): usestr(sig)
instead ofname()
(#1179, #1232)lca summarize
no longer merges all signatures, and uses hash abundance by default (#1175)index
andlca index
(#1186, #1222) now support--from-file
and no longer require signature files on command line--traverse-directory
is now on by default for signature loading behavior (#1178)sourmash sketch
andsourmash compute
no longer create empty signatures from empty files and stdin (#1347);sourmash sketch
andsourmash compute
setsig.filename
to empty string when filename is-
(#1347);
Feature removal
- remove Python 2.7 support (& end Python 2 compatibility) (#1145, #1144)
- remove
lca gather
(#1307) - remove 10x support from
sourmash compute
(#1229) - remove 'dump' command (#1157)
Feature/function deprecations
- deprecate
sourmash compute
(#1159) - deprecate
load_signatures
,sourmash.load_one_signature
,create_sbt_index
, andload_sbt_index
(#1279, #1304) - deprecate import_csv in favor of new
sourmash sig import --csv
(#1281)
Refactoring, improvements, and minor bug fixes:
- accept file list in
sourmash sig cat
(#1236) - add unique_intersect_bp and gather_result_rank to gather CSV output (#1219)
- remove deprecated minhash functions (#1149)
- fix Rust panic error in signature creation (#1172)
- cache nodes in SBT during search (#1161)
- fix two bugs in gather --output-unassigned (#1156)
- Refactor the gather code so that it uses 'hashes' instead of 'mins' (#1329)
- Update output from gather w/o abundances, so that abund output is empty instead of 0(#1328)
Documentation updates
- substantial revisions and updates to the documentation (#1283)
- add information about versioning, migrations, etc to the docs (#1153)
Infrastructure and CI changes:
- update finch requirement from 0.3.0 to 0.4.1 (#1290)
- update rand for test, and activate "js" feature for getrandom (#1275)
- dev updates (configs and doc) (#1298)
- move wheel building from Travis to GitHub Actions (#1295)
- fix new clippy warnings from Rust 1.49 (#1267)
- use tox for running tests locally (#696)
- CI: small build fixes (#1252)
- CI: Fix releases in GitHub Actions (#1250)
- update build_wheel action paths
- CI: moving python tests from travis to GH actions (#1249)
- CI: move wheel building to GitHub actions (#1244)
- remove last .rst file from docs (#1185)
- update CI for latest branch name change (#1150)
v3.5.1
Feature deprecations
- add deprecation warning for
sourmash compute --input-is-10x
(#1326) - add warnings about new
sourmash lca summarize
behavior (#1326) - add warning for new behavior of
MinHash.merge(...)
(#1326) - add deprecation warning for
TarStorage
(#1165)
Infrastructure and CI changes:
- Backport github actions to stable branch (3.5.x) (#1317)
v3.5.0
This is the first of several minor releases (v3.5.x) from the new stable
branch. These releases focus on preparing for sourmash v4.0 by introducing deprecations and warnings for features that will be removed in v4.0.
Refactoring and deprecations:
MinHash
class refactoring (#1128, #1129); many deprecations for 4.0 and 5.0sourmash dump
deprecated, for removal in 4.0 (#1147)import sourmash_lib
deprecated, for removal in 4.0 (#1143)
Cleanup:
- remove mentions of ijson and khmer (no longer needed dependencies) #1140
Documentation:
- Simplify and clean up README (#1124)
- Add sourmash logo to docs and README (#1127)
- update release process and release notes (#1125)
Rust:
- Update typed-builder requirement from 0.6.0 to 0.7.0 (#1121)
v3.4.1
Major new features:
- Document
sourmash.fig
usage and behavior; enable output ofcompare
clustering with labels (#859) - Adds --majority option to
lca classify
using majority vote algorithm (#1113)
Minor improvements:
- MinHash compatibility check to sourmash sig intersect (#1116)
Bugs fixed:
- add ksize selectors back into sourmash sig functions (#1105)
Documentation updates: