Skip to content

Releases: octue/octue-sdk-python

Fix: Add missing python-dateutil dependency

14 Jun 16:24
54a53ad
Compare
Choose a tag to compare

Contents

Dependencies

  • Add missing python-dateutil dependency

Tweak semantic version incrementing rules

14 Jun 16:07
93c8bd2
Compare
Choose a tag to compare

Contents

Operations

  • Make non-feature/breaking changes require a patch version increase
  • Disable major version increments while package is in beta

Enable continuous deployment with semantic versions

14 Jun 15:26
0de5df2
Compare
Choose a tag to compare

Contents

Operations

  • Run release workflow on merge of any branch into main
  • Add Conventional Commits pre-commit hook
  • Replace check-version-consistency job with check-semantic-version job, which checks that the version in setup.py is the same as the semantic version expected by git-mkver given the Conventional Commits since the last tag
  • Add the update-pull-request workflow that auto-generates part of the PR description on each commit
  • Run publish test job on all branches and make it dependent on check-semantic-version job passing
  • Rename tests job to run-tests

Release/0.1.19

02 Jun 21:37
67b8939
Compare
Choose a tag to compare

Contents

New Features

  • Make Datafiles and Datasets labelable
  • Use new version of tags in all Taggables
  • Replace string tags in a TagSet with key-value pairs in a TagDict
  • Add new Taggable mixin for providing the new tags interface
  • Add FilterDict, allowing filtering of key-value pairs by their values
  • Allow nested attribute/dictionary filtering in the filter containers FilterSet, FilterList and FilterDict
  • Allow any number of filters to be specified when filtering in filter containers
  • Allow ignoring of filterables missing the filtered-for attribute in a filter container instead of raising an error
  • Add filter container one method?
  • Allow ordering by nested attributes in all FilterContainers
  • Allow gs:// paths to be used in Datafile, Dataset, and Manifest
  • Allow gs:// paths to be used in storage client
  • Add datetime filters
  • Add in-range filters to str, datetime, and Number filters

Breaking changes

  • Use new format for manifests' datasets in twine.json files
  • Convert old Taggable mixin to Labelable mixin
  • Convert old Tag class to Label class
  • Convert TagSet to LabelSet
  • Use key-value pairs for filter names and values when filtering Filterables
  • Stop logging in Serialisable
  • Always exclude logger field in Serialisable
  • Simplify tag name pattern to ^[a-z0-9][a-z0-9_]*(?<!_)$
  • Simplify label pattern to ^[a-z0-9][a-z0-9-]*(?<!-)$
  • Store tags as key-value pairs in GCS custom metadata
  • Unbase TagDict and LabelSet from filter containers
  • JSON-encode cloud storage custom metadata again
  • Store tags in tags field of cloud metadata again
  • Close #165: prefix GCS custom metadata fields with "octue__"

Minor improvements

  • Remove filters field from manifest strand in twines
  • Allow tags to be added via kwargs in Taggable.add_tags
  • Remove unused _FILTERSET_ATTRIBUTE class variables
  • Base Label on str
  • Support non-English characters in case-insensitive filtering
  • Add octue-sdk-python version to datafile metadata
  • Base filter containers on new FilterContainer abstract class
  • Move filter and order methods into FilterContainer
  • Use OctueJSONDecoder in Serialisable and GoogleCloudStorageClient
  • Add de/serialisation of datetime objects to de/encoders
  • Clarify name of some GoogleCloudStorageClient methods
  • Add set and UserString encoding to OctueJSONEncoder
  • Use OctueJSONDecoder
  • Add set and datetime decoding to OctueJSONDecoder
  • Remove unnecessary methods from LabelSet
  • Rename add_labels method and add add method to LabelSet
  • Automatically generate complementary (not) filters from other filters
  • Remove a line of duplicated code in Datafile

Fixes

  • Handle timestamps from cloud with/without timezone information
  • Fix OctueJSONDecoder
  • Make it harder to add invalid labels to LabelSet

Dependencies

  • Use new version of twined that distinguishes tags from labels

Testing

  • Use latest GCS emulator
  • Only run deployment test if RUN_DEPLOYMENT_TESTS envvar is True

Quality Checklist

  • New features are fully tested (No matter how much Coverage Karma you have)
  • [v0.2 onward] New features are included in the documentation

Release/0.1.18

12 May 14:44
d095788
Compare
Choose a tag to compare

Contents

New features

  • Allow decimal points in tags

Minor improvements

  • Close #162: make timestamp an optional parameter for Datafile

Quality Checklist

  • New features are fully tested (No matter how much Coverage Karma you have)

Release/0.1.17

07 May 18:05
8924d88
Compare
Choose a tag to compare

Contents

New Features

  • Allow Datafile to be used as a context manager for changes to local datafiles
  • Allow Datafile.from_cloud to be used as a context manager for changes to cloud datafiles
  • Allow Datafile to remember where in the cloud it came from
  • Add the following methods to Datafile:
    • get_cloud_metadata
    • update_cloud_metadata
    • clear_from_file_cache
    • _get_cloud_location
    • _store_cloud_location
    • _check_for_attribute_conflict
  • Avoid re-uploading Datafile file or metadata if they haven't changed
  • Raise error if implicit cloud location is missing from Datafile
  • Add GoogleCloudStorageClient.update_metadata method
  • Allow option to not update cloud metadata in Datafile cloud methods
  • Allow tags to contain capitals and forward slashes (but not start or end in a forward slash)
  • Allow datetime and posix timestamps for Datafile.timestamp
  • Add Datafile.posix_timestamp property

Breaking changes

  • Close #148: remove hash_value from Datafile GCS metadata
  • When hashing Datafiles, only hash represented file (i.e. stop hashing metadata)
  • When hashing Datasets and Manifests, only hash the files contained (i.e. stop hashing metadata)
  • Make hash of Hashable instance with _ATTRIBUTES_TO_HASH=None the empty string hash value "AAAAAA=="

Minor improvements

  • Simplify output of GoogleCloudStorageClient.get_metadata
  • Make Hashable instances re-calculate their hash_value every time unless an immutable_hash_value is explicitly provided (e.g. for cloud datafiles where you don't have the file locally to hash)
  • Add private Identifiable._set_id method
  • Close #147: pull metadata gathering for Datafile into method
  • Get datetime objects directly from GCS blob instead of parsing string serialisations
  • Add time utils module
  • Add hash preparation function to Hashable for datetime instances
  • Use the empty string hash value for Datafile if GCS crc32c metadata isn't present
  • Stop serialising hash value of Manifest, Dataset, and Datafile

Fixes

  • Close #146: Stop serialising GCS metadata as JSON. This avoids strings in the metadata appearing in two sets of quotation marks on Google Cloud Storage. This is a breaking change for any files already persisted with JSON-encoded metadata.
  • Remove ability to set custom hash value via kwargs when using Datafile.from_cloud

Testing

  • Factor out cloud datafile creation in datafile tests

Quality Checklist

  • New features are fully tested (No matter how much Coverage Karma you have)

Release/0.1.16

03 May 10:31
aa1f9cc
Compare
Choose a tag to compare

Contents

Breaking changes

  • Rename Service.__init__ parameter id to service_id to avoid built-in name clash
  • Move deployment package into cloud package

Dependencies

  • Use newest version of twined to support python>=3.6

Minor improvements

  • Remove duplicate code and unnecessary comments from Runner
  • Raise error if SERVICE_ID envvar is missing from deployment environment
  • Disallow non-None empty values as Service IDs
  • Add base class for service backends; update docstrings

Fixes

  • Use OctueJSONEncoder in JSON serialisation inside Service.answer to ensure numpy arrays are serialised

Testing

  • Add tests for Topic and Subscription
  • Add extra test for Service
  • Shorten runtime of cli.start test

Release/0.1.15

26 Apr 14:04
258f568
Compare
Choose a tag to compare

Contents

Fixes

  • Add from_string option to Serialisable.deserialise

Testing

  • Mock Google Pub/Sub Service, Topic, Subscription, Publisher and Subscriber in tests
  • Remove unneeded cleanup code from Service tests

Release/0.1.14

23 Apr 16:56
61fa92f
Compare
Choose a tag to compare

Contents

Breaking changes

  • Remove TagSet.__str__

Fixes

  • Use TagSet to deserialise tags in Datafile.from_cloud
  • Add custom (de)serialise methods to TagSet
  • Return subtags of a Tag in order using a FilterList
  • Remove separate dependencies copy/cache steps in Google Cloud Run Dockerfile so that it works for older versions of docker

Minor improvements

  • Remove absolute path from Dataset and Manifest serialisation
  • Add Serialisable.deserialise method
  • Add filter method to TagSet to avoid e.g. taggable.tags.tags.filter

Operations

  • Improve description of release workflow

Release/0.1.13

21 Apr 12:35
eb0817b
Compare
Choose a tag to compare

Contents

New features

  • Support setup.py and requirements-dev.txt in Cloud Run Dockerfile
  • Retrieve credentials from Google Cloud Secret Manager and inject into environment in Runner.run
  • Add ability to retrieve and update cloud files via the Datafile.download or Datafile.open methods
  • Allow cloud file attributes to be updated via Datafile.to_cloud method
  • Allow instantiation of TagSets from JSON-encoded lists

Breaking changes

  • Raise error if the datasets of the input manifest passed to Service.ask aren't all cloud-based

Fixes

  • Fix Dataset construction from serialised form in Manifest
  • Fix Datafile construction from serialised form in Dataset
  • Fix Datafile.deserialise
  • Adjust usages of tempfile.NamedTemporaryFile to also work on Windows
  • Add timeout and retry to Service.answer
  • Add retry to Service.wait_for_answer
  • Add 60 second timeout for answering question in Cloud Run deployment
  • Use correct environment variable for service ID in Cloud Run Dockerfile
  • Set _last_modified, size_bytes, and _hash_value to null values if a Datafile representing a cloud file is instantiated for a hypothetical cloud location (i.e. not synced to a cloud file at that point in time)
  • Allow Dataset.get_file_sequence use with no filter

Dependencies

  • Use new twined version that supports validation of credentials strand
  • Use newest version of gcp-storage-emulator

Minor improvements

  • Make path a positional argument of Datafile
  • Move gunicorn requirement into octue requirements
  • Raise warning instead of error if Google Cloud credentials environment variable is not found and return None as credentials
  • Move cloud code into new cloud subpackage
  • Raise TimeoutError in Service.wait_for_answer if no response is received by end of retries
  • Only look for deployment_configuration.json file in docker container /app directory
  • Ensure deployment_configuration.json file is always loaded correctly in docker container
  • Pass credentials strand into Runner instance in Cloud Run deployment
  • Add name attribute to Identifiable mixin
  • Add Google Cloud metadata to Datafile serialisation
  • Add deserialise method to Datafile
  • Add ability to add metadata to a Datafile instantiated from a regular cloud file
  • Use CRC32C hash value from Google Cloud when instantiating a Datafile from the cloud
  • Add ability to name Datafiles
  • Add ability to check whether a Datafile, all Datafiles in a Dataset, or all Datasets in a Manifest are located in Google Cloud
  • Use Datafile.deserialise when instantiating a Dataset from a dictionary
  • Add representation to GCPPubSubBackend
  • Load credentials strand JSON in Runner initialisation
  • Add location searched to message of error raised when app module can't be found in Runner.run
  • Ignore E203 flake8 warning

Testing

  • Remove subjective Service test test_serve_with_timeout
  • Use temporary file rather than temporary directory for tests where possible
  • Test Dataset.deserialise

Quality Checklist

  • New features are fully tested (No matter how much Coverage Karma you have)

Coverage Karma

  • If your PR decreases test coverage, do you feel you have built enough Coverage Karma* to justify it?