Releases: octue/octue-sdk-python
Releases · octue/octue-sdk-python
Fix: Add missing python-dateutil dependency
Contents
Dependencies
- Add missing
python-dateutil
dependency
Tweak semantic version incrementing rules
Contents
Operations
- Make non-feature/breaking changes require a patch version increase
- Disable major version increments while package is in beta
Enable continuous deployment with semantic versions
Contents
Operations
- Run
release
workflow on merge of any branch intomain
- Add Conventional Commits
pre-commit
hook - Replace
check-version-consistency
job withcheck-semantic-version
job, which checks that the version insetup.py
is the same as the semantic version expected bygit-mkver
given the Conventional Commits since the last tag - Add the
update-pull-request
workflow that auto-generates part of the PR description on each commit - Run
publish
test job on all branches and make it dependent oncheck-semantic-version
job passing - Rename
tests
job torun-tests
Release/0.1.19
Contents
New Features
- Make
Datafile
s andDataset
s labelable - Use new version of tags in all
Taggable
s - Replace string tags in a
TagSet
with key-value pairs in aTagDict
- Add new
Taggable
mixin for providing the new tags interface - Add
FilterDict
, allowing filtering of key-value pairs by their values - Allow nested attribute/dictionary filtering in the filter containers
FilterSet
,FilterList
andFilterDict
- Allow any number of filters to be specified when filtering in filter containers
- Allow ignoring of filterables missing the filtered-for attribute in a filter container instead of raising an error
- Add filter container
one
method? - Allow ordering by nested attributes in all
FilterContainer
s - Allow
gs://
paths to be used inDatafile
,Dataset
, andManifest
- Allow
gs://
paths to be used in storage client - Add
datetime
filters - Add in-range filters to
str
,datetime
, andNumber
filters
Breaking changes
- Use new format for manifests' datasets in
twine.json
files - Convert old
Taggable
mixin toLabelable
mixin - Convert old
Tag
class toLabel
class - Convert
TagSet
toLabelSet
- Use key-value pairs for filter names and values when filtering
Filterable
s - Stop logging in
Serialisable
- Always exclude
logger
field inSerialisable
- Simplify tag name pattern to
^[a-z0-9][a-z0-9_]*(?<!_)$
- Simplify label pattern to
^[a-z0-9][a-z0-9-]*(?<!-)$
- Store tags as key-value pairs in GCS custom metadata
- Unbase
TagDict
andLabelSet
from filter containers - JSON-encode cloud storage custom metadata again
- Store tags in
tags
field of cloud metadata again - Close #165: prefix GCS custom metadata fields with "octue__"
Minor improvements
- Remove
filters
field from manifest strand in twines - Allow tags to be added via
kwargs
inTaggable.add_tags
- Remove unused
_FILTERSET_ATTRIBUTE
class variables - Base
Label
onstr
- Support non-English characters in case-insensitive filtering
- Add
octue-sdk-python
version to datafile metadata - Base filter containers on new
FilterContainer
abstract class - Move
filter
andorder
methods intoFilterContainer
- Use
OctueJSONDecoder
inSerialisable
andGoogleCloudStorageClient
- Add de/serialisation of
datetime
objects to de/encoders - Clarify name of some
GoogleCloudStorageClient
methods - Add
set
andUserString
encoding toOctueJSONEncoder
- Use
OctueJSONDecoder
- Add
set
anddatetime
decoding toOctueJSONDecoder
- Remove unnecessary methods from
LabelSet
- Rename
add_labels
method and addadd
method toLabelSet
- Automatically generate complementary (
not
) filters from other filters - Remove a line of duplicated code in
Datafile
Fixes
- Handle timestamps from cloud with/without timezone information
- Fix
OctueJSONDecoder
- Make it harder to add invalid labels to
LabelSet
Dependencies
- Use new version of
twined
that distinguishes tags from labels
Testing
- Use latest GCS emulator
- Only run deployment test if
RUN_DEPLOYMENT_TESTS
envvar isTrue
Quality Checklist
- New features are fully tested (No matter how much Coverage Karma you have)
- [v0.2 onward] New features are included in the documentation
Release/0.1.18
Contents
New features
- Allow decimal points in tags
Minor improvements
- Close #162: make timestamp an optional parameter for Datafile
Quality Checklist
- New features are fully tested (No matter how much Coverage Karma you have)
Release/0.1.17
Contents
New Features
- Allow
Datafile
to be used as a context manager for changes to local datafiles - Allow
Datafile.from_cloud
to be used as a context manager for changes to cloud datafiles - Allow
Datafile
to remember where in the cloud it came from - Add the following methods to
Datafile
:get_cloud_metadata
update_cloud_metadata
clear_from_file_cache
_get_cloud_location
_store_cloud_location
_check_for_attribute_conflict
- Avoid re-uploading
Datafile
file or metadata if they haven't changed - Raise error if implicit cloud location is missing from
Datafile
- Add
GoogleCloudStorageClient.update_metadata
method - Allow option to not update cloud metadata in
Datafile
cloud methods - Allow tags to contain capitals and forward slashes (but not start or end in a forward slash)
- Allow
datetime
and posix timestamps forDatafile.timestamp
- Add
Datafile.posix_timestamp
property
Breaking changes
- Close #148: remove
hash_value
fromDatafile
GCS metadata - When hashing
Datafile
s, only hash represented file (i.e. stop hashing metadata) - When hashing
Dataset
s andManifest
s, only hash the files contained (i.e. stop hashing metadata) - Make hash of
Hashable
instance with_ATTRIBUTES_TO_HASH=None
the empty string hash value"AAAAAA=="
Minor improvements
- Simplify output of
GoogleCloudStorageClient.get_metadata
- Make
Hashable
instances re-calculate theirhash_value
every time unless animmutable_hash_value
is explicitly provided (e.g. for cloud datafiles where you don't have the file locally to hash) - Add private
Identifiable._set_id
method - Close #147: pull metadata gathering for
Datafile
into method - Get
datetime
objects directly from GCS blob instead of parsing string serialisations - Add
time
utils module - Add hash preparation function to
Hashable
fordatetime
instances - Use the empty string hash value for
Datafile
if GCScrc32c
metadata isn't present - Stop serialising hash value of
Manifest
,Dataset
, andDatafile
Fixes
- Close #146: Stop serialising GCS metadata as JSON. This avoids strings in the metadata appearing in two sets of quotation marks on Google Cloud Storage. This is a breaking change for any files already persisted with JSON-encoded metadata.
- Remove ability to set custom hash value via
kwargs
when usingDatafile.from_cloud
Testing
- Factor out cloud datafile creation in datafile tests
Quality Checklist
- New features are fully tested (No matter how much Coverage Karma you have)
Release/0.1.16
Contents
Breaking changes
- Rename
Service.__init__
parameterid
toservice_id
to avoid built-in name clash - Move
deployment
package intocloud
package
Dependencies
- Use newest version of
twined
to support python>=3.6
Minor improvements
- Remove duplicate code and unnecessary comments from
Runner
- Raise error if
SERVICE_ID
envvar is missing from deployment environment - Disallow non-None empty values as
Service
IDs - Add base class for service backends; update docstrings
Fixes
- Use
OctueJSONEncoder
in JSON serialisation insideService.answer
to ensurenumpy
arrays are serialised
Testing
- Add tests for
Topic
andSubscription
- Add extra test for
Service
- Shorten runtime of
cli.start
test
Release/0.1.15
Contents
Fixes
- Add
from_string
option toSerialisable.deserialise
Testing
- Mock Google Pub/Sub
Service
,Topic
,Subscription
,Publisher
andSubscriber
in tests - Remove unneeded cleanup code from
Service
tests
Release/0.1.14
Contents
Breaking changes
- Remove
TagSet.__str__
Fixes
- Use TagSet to deserialise tags in
Datafile.from_cloud
- Add custom (de)serialise methods to
TagSet
- Return subtags of a
Tag
in order using aFilterList
- Remove separate dependencies copy/cache steps in Google Cloud Run Dockerfile so that it works for older versions of
docker
Minor improvements
- Remove absolute path from
Dataset
andManifest
serialisation - Add
Serialisable.deserialise
method - Add
filter
method toTagSet
to avoid e.g.taggable.tags.tags.filter
Operations
- Improve description of release workflow
Release/0.1.13
Contents
New features
- Support
setup.py
andrequirements-dev.txt
in Cloud Run Dockerfile - Retrieve credentials from Google Cloud Secret Manager and inject into environment in
Runner.run
- Add ability to retrieve and update cloud files via the
Datafile.download
orDatafile.open
methods - Allow cloud file attributes to be updated via
Datafile.to_cloud
method - Allow instantiation of
TagSet
s from JSON-encoded lists
Breaking changes
- Raise error if the datasets of the input manifest passed to
Service.ask
aren't all cloud-based
Fixes
- Fix
Dataset
construction from serialised form inManifest
- Fix
Datafile
construction from serialised form inDataset
- Fix
Datafile.deserialise
- Adjust usages of
tempfile.NamedTemporaryFile
to also work on Windows - Add timeout and retry to
Service.answer
- Add retry to
Service.wait_for_answer
- Add 60 second timeout for answering question in Cloud Run deployment
- Use correct environment variable for service ID in Cloud Run Dockerfile
- Set
_last_modified
,size_bytes
, and_hash_value
to null values if aDatafile
representing a cloud file is instantiated for a hypothetical cloud location (i.e. not synced to a cloud file at that point in time) - Allow
Dataset.get_file_sequence
use with no filter
Dependencies
- Use new
twined
version that supports validation ofcredentials
strand - Use newest version of
gcp-storage-emulator
Minor improvements
- Make
path
a positional argument ofDatafile
- Move
gunicorn
requirement intooctue
requirements - Raise warning instead of error if Google Cloud credentials environment variable is not found and return
None
as credentials - Move cloud code into new
cloud
subpackage - Raise
TimeoutError
inService.wait_for_answer
if no response is received by end of retries - Only look for
deployment_configuration.json
file in docker container/app
directory - Ensure
deployment_configuration.json
file is always loaded correctly in docker container - Pass credentials strand into
Runner
instance in Cloud Run deployment - Add
name
attribute toIdentifiable
mixin - Add Google Cloud metadata to
Datafile
serialisation - Add
deserialise
method toDatafile
- Add ability to add metadata to a
Datafile
instantiated from a regular cloud file - Use CRC32C hash value from Google Cloud when instantiating a
Datafile
from the cloud - Add ability to name
Datafile
s - Add ability to check whether a
Datafile
, allDatafile
s in aDataset
, or allDataset
s in aManifest
are located in Google Cloud - Use
Datafile.deserialise
when instantiating aDataset
from a dictionary - Add representation to
GCPPubSubBackend
- Load credentials strand JSON in
Runner
initialisation - Add location searched to message of error raised when
app
module can't be found inRunner.run
- Ignore
E203
flake8 warning
Testing
- Remove subjective
Service
testtest_serve_with_timeout
- Use temporary file rather than temporary directory for tests where possible
- Test
Dataset.deserialise
Quality Checklist
- New features are fully tested (No matter how much Coverage Karma you have)
Coverage Karma
- If your PR decreases test coverage, do you feel you have built enough
Coverage Karma
* to justify it?