Implements a dataset collection concept using study name suffixes (tags/tokens/labels):
- The tabular import workflow uses value for key
Study collection
instudy.json
. - API endpoint
study-names
hides collection-tagged datasets by default. - Other API handlers unchanged, work as-is using the fully-qualified study names.
spt db collection ... --publish / --unpublish
provided to managed collection visibility.
Organize workflow configuration options into a workflow configuration file. This breaks the API for tabular import and similar.
Add support for small specimens (small cell set) in GNN workflow.
Add KDTree optimization to GNN ROI creation.
- Deprecates heavy index on large tables:
- Adds a new table for tracking scope ranges.
- Converts the former
source_specimen
column onexpression_quantification
to a `SERIAL`` integer. - Makes tabular import keep track of ranges per-specimen in the new range_definitions table.
- Updates the "optimized" sparse matrix query to use the ranges rather than the former huge index.
- Deprecates the modify-constraints CLI entrypoint (only used internally now).
- Deprecates the expression indexing module, CLI entrypoint, etc.
- Separates datasets into own databases:
DBCursor
andDBConnection
usage streamlined, typically requires study-scoping (dataset-scoping).- Deprecates
scstudies
database from database cluster. Replaced bydefault_study_lookup
and per-dataset databases. - Update test data artifacts which depended on all datasets being cohoused in the same database (e.g. things dealing with identifiers issue )
- Adds study-scoping throughout codebase where previously global identifers were assumed.
- Updates development DB image from postgres 14.5 to 16.0.
- Deprecates
initialize_schema.sql
that was previously used to feed the DB docker image initialization.
- Adds DGL and pytorch to the big development docker image (in which are run all the tests).
- Deprecate most occurrences of package-global namespace symbols, to reduce possible "leak" of unnecessary library imports for otherwise simple calls.
Deprecated nearest distance and density workflows.
Includes convenience whole-dataset pulling from the database.
- Deprecated front proximity workflow (for now).
- Large-scale linting of library code.
Separated build and test directories out of the source tree.