Skip to content
Paul Haesler edited this page May 2, 2023 · 7 revisions

ODC v2 Roadmap

Initial Publication: 2022-04-28

Last Status Update: 2022-09-26

Author: Paul Haesler (@SpacemanPaul)

Suggested supplementary reading:

Background

EP03 raises several technical shortcomings with current ODC index and datamodel APIs and offers suggestions for improvements. It is a little out of date, in that it was written before version 1.8. Some of the most pressing issues raised in EP03 were addressed with the introduction of the EO3 metadata type, however backwards compatible support for non-EO3 metadata types hinders further architectural change.

Recent unreleased commits to datacube-core have validated the ability of the ODC to support multiple index drivers, with a trivial "null" index that is always empty, and a non-persistent "in-memory" index as test implementations.

A significant issue raised in EP03 is the poor performance of the current postgres driver, particularly with respect to spatial queries. Successful experiments on postgis spatial indexes over large ODC databases have been performed by the datacube-ows project, with OWS database searches are an order of magnitude faster than the current ODC native queries. The OWS approach is a high-maintenance overlay however, and can only be taken further within datacube-core.

Introduction

This roadmap intended as a somewhat aspirational/optimistic list of things to work on, and the rough order in which they would need to be tackled, superceding the Overhaul of index driver layer plan. This document will continue to evolve as progress is made.

Some items are aspirational "stretch goals" at best, and the most appropriate course of action on some items will no doubt turn out to be "do nothing, just live with what we have". Everything is subject to change and the rate of progress will be subject to the support of the ODC community.

Therefore this roadmap has been developed with the intention that a steady flow tangible benefits are delivered throughout. If progress stalls at any particular point, the ODC community should still be better off than if no progress had been made at all.

Summary of Roadmap Phases

  1. A postgis index driver is developed, initially cloned from the postgres driver. Index driver API remains fully backward compatible. A detailed design for new index driver and datamodel APIs is drafted and discussed.
  2. Develop branch forks into separate v1 and v2 branches:
    1. 1.x releases providing transitional APIS, deprecation of upcoming breaking changes, and support for migrating databases to the new postgis driver.
    2. 2.0-preX implementing the new APIs discussed above.
  3. Release datacube-core 2.0.0. postgres driver available as read-only stub for migration, postgis driver is now the default. Continue maintenance support for 1.x releases for 6-12 months

Phase One: Postgis driver and v2 API Design

Although many of the changes proposed in EP03 require backwards-incompatible API changes, significant progress can be made on a postgis-based index driver with stronger spatio-temporal modeling without breaking backwards-compatibility

Some of the features/actions listed below may end up requiring breaking API changes and need to be pushed back to Phase 2, but at least some progress should be possible within the confines of the existing API.

Releases 1.8.7+

  • No backwards incompatible changes to the postgres/default index driver.
  • No backwards incompatible changes to index driver API or model layer API.
  • Progressively remove support for behaviour flagged as "deprecated" in 1.8.6, cleaning up code as permitted. (underway)
  • Proposed new backwards-incompatible APIs at index driver and model layers for odc v2 drafted and discussed.
    • As much as possible, the new index API should be: simple, flexible, internally consistent and modular.
    • Query methods should be streamable.
    • bulk add/get/update/remove methods (underway)
    • Lightweight context-manager based transaction API. ODC-EP07 Database Transaction API
    • Methods that are syntactic sugar for lower-level methods should be moved to the high-level API and/or the top-level object.
    • Models should know their index. This would be either the index the model was read from or the index the model is added to. Dynamically created or unsaved models may have a None index, or it may be passed in when created/constructed.
    • Changes to lineage handling: Done ODC EP08 New Lineage API PR#1401PR#1429
      • Lineage API methods deal with UUIDs only. Drop existing "source_filter" query options (or expose as syntatic sugar in high level)
      • Lineage as UUID-only, metadata-only (no loading info), or full-fledged loadable datasets, all supported.
    • Transaction handling. Done (ODC-EP07 Database Transaction API PR #1318)
    • Change to product matching rules, as discussed in EP-03.
    • Overhaul of model layer API from both the index layer and read/load layer perspectives. Some example ideas in EP03.
    • Product level data summary data
    • Standardise time dimension conventions - particularly with regard to products that have a single acquisition time vs statistical products that cover multiple days worth of acquisitions.
  • New postgis index driver, flagged in this phase as "experimental", meaning no guarantee of a non-destructive migration path from databases created by previous releases.
    • Create postgis index driver, initially as clone of postgres driver. Flag as "experimental". (Done)
    • Drop support for non-EO3-compatible geospatial metadatatypes. Non-geospatial metadata types (e.g. telemetry) still supported for meta-data and lineage only. (Done)
    • Cleanup non-EO3 code, and optimise for EO3. (Underway)
    • Switch to a proper database migrations framework.
    • Smarter temporal index.
    • Geospatial indexes: (Done)
      • New index API method to create a geospatial index for a CRS. (EPSG:4326 created on init by default) (Done)
      • All datasets with extent polygons that can be validly projected to the CRS are automatically added to the index. (Done)
      • Metadata/lineage-only datasets are excluded from geospatial indexes. (Done)
      • Datasets whose extents cannot be safely reprojected into a geospatial index's CRS are excluded from that index. (Done)
      • New API search parameter to search by geometry:
      • Existing lat/lon searches redirected to new geometric search parameter. (Done)
      • If no geospatial index exists for the search geometry CRS, project to 4326. (Done)
      • The postgres driver obviously won’t support geospatial indexes – and will convert search geometries to a 4326 bounding box. (Done)
    • Smarter metadata search indexes (underway)
    • Further cleanup and optimisation of database schema. (Underway)

When the "experimental" flag comes off the postgis index driver and a draft v2 API has the support of the Steering Council and the broader ODC community, we can progress to the next phase.

Phase Two: Active v2 development

Switch to odc-geo. Split develop branch into develop-1.x and develop-2.x branches.

A more detailed plan can be prepared once a draft v2 API has been agreed.

Releases 1.9.x: Transitional releases.

  • New API methods/behaviour in index driver and model API
  • Minor breaking changes allowed, but avoid where possible
  • Deprecate but continue to support old methods/behaviour
  • Implementation of new methods/behaviour may be poorly-optimised, minimal or non-existent in the postgres driver
  • Postgres index driver remains the default
  • Migration tools to copy/upgrade data from postgres driver to postgis driver.

Pre-Releases 2.0.0-preX: backwards incompatible unstable

  • API unstable and actively evolving.
  • Support for old API methods/behaviours progressively dropped
  • Postgis index driver is now the default.
  • Bulk read/write operations.
  • Transaction API.
  • Postgres driver no longer supported, except as a read-only stub suitable for migrating from.
  • Cleanup of reader driver API - including reviving asynchronous reader API?
  • Major code cleanups, including removal of no-longer needed code
  • Changes to lineage handling, data model layer. EP-003 has many useful thoughts in this area.
  • STAC + in-memory cache index driver
  • Update documentation for new APIs.
  • Parallel development updating satellite repositories (Statistician, Explorer, OWS, etc.), including replacing or simplifying auxillary indexes used by e.g. Explorer and OWS.

When the develop-2.x branch stabilises, we can commit to a 2.0.0 release.

Phase 3: post-v2.0.0 release

The new normal.

Releases 2.x.y: Stable

APIs are now stable. Semantic versioning with respect to API changes, as per current status quo. Regular releases and occasional minor API breakages expected as the dust settles.

Releases 1.9.x: Legacy support

Continue 1.9.x legacy releases for 6-12 months after 2.0.0 release, with migration from 1.x to 2.x strongly encouraged and supported.

Clone this wiki locally