Skip to content

Commit

Permalink
For 1.9.0 rc1 (#1569)
Browse files Browse the repository at this point in the history
* Update README and Migration notes.

* Clarify/correct CLI help text for clone command.

* Update whats_new.rst - phase 1.

* Whats_new updates phase 2 and minor updates to migration notes.

* Update PR number.

* Add missing PR.

* Update docs/MIGRATION-1.8-to-1.9.rst

Co-authored-by: Robbi Bishop-Taylor <Robbi.BishopTaylor@ga.gov.au>

* Updates to migration document, based on Robbi's feedback.

---------

Co-authored-by: Robbi Bishop-Taylor <Robbi.BishopTaylor@ga.gov.au>
  • Loading branch information
SpacemanPaul and robbibt committed Mar 27, 2024
1 parent 874afd7 commit 411f6c5
Show file tree
Hide file tree
Showing 4 changed files with 230 additions and 5 deletions.
4 changes: 4 additions & 0 deletions README.rst
Expand Up @@ -33,6 +33,10 @@ setting up or using the Open Data Cube.
Please help us to keep the Open Data Cube community open and inclusive by
reading and following our `Code of Conduct <code-of-conduct.md>`__.

This is a ``1.9.x`` series release of the Open Data Cube. If you are migrating from a ``1.8.x``
series release, please refer to the
`1.8.x to 1.9.x Migration Notes <https://datacube-core.readthedocs.io/en/latest/MIGRATION-1.8-to-1.9.html>`_.

Requirements
============

Expand Down
2 changes: 1 addition & 1 deletion datacube/scripts/system.py
Expand Up @@ -102,7 +102,7 @@ def echo_field(name, value):
default=1000)
@click.option(
'--skip-lineage/--no-skip-lineage', is_flag=True, default=False,
help="Clone lineage data where possible. (default: true)"
help="Do not load lineage data where possible. (default: false - i.e. do not skip lineage)"
)
@click.option(
'--lineage-only/--no-lineage-only', is_flag=True, default=False,
Expand Down
211 changes: 211 additions & 0 deletions docs/MIGRATION-1.8-to-1.9.rst
@@ -0,0 +1,211 @@

Migrating from ODC 1.8.x to 1.9.x
=================================

The last new major release of the Open Data Cube was v1.8.0 in May 2020, nearly 4 years ago.

ODC developers and the Steering Council have been working hard behind the scenes over the last couple of years
to address some of the accumulated technical debt in datacube-core and prepare for new major releases.

The long-term plan includes a number of significant backwards-incompatible changes. An effort has been made to
provide a smooth migration pathway wherever possible, with existing behaviour in 1.8.x versions being deprecated
in 1.9.x, with alternatives being provided, then removing the deprecated behaviour in 2.0.x with the alternative
approaches becoming the standard, but some minor backwards incompatible changes in 1.9.x were unavoidable.

This document describes the changes between 1.8.x and 1.9.x, with a particular focus on backwards incompatible
changes and new features.

After the release of 1.9.0, focus will shift to updating secondary ODC libraries to work with ODC-1.9. (Explorer
and OWS in particular will require major changes.) We will continue to support and release 1.8.x versions after
the release of 1.9.0, until the 1.9.x releases have stabilised and all secondary libraries are up to date.

Smaller ODC installations will probably prefer to stick with the 1.8.x releases for the time being, but if you can
spare the resources we encourage you to set up a 1.9.x installation to test your existing code and systems
against the new release, and open issues on github for any problems that you come up against, especially any that are
not documented here.

Major Changes between 1.8.x and 1.9.x
-------------------------------------

1. Integration with ``odc-geo``.

The old ``datacube.utils.geometry`` library has been replaced by ``odc-geo``.

If you have already used ``odc-geo`` you will appreciate the additional power and flexibility that this brings to
core. If you have not, please take the time to have a read through the
`odc-geo documentation <https://odc-geo.readthedocs.io/en/latest/>`_ and especially the
`migration notes <https://odc-geo.readthedocs.io/en/latest/migration.html>`_. In particular, you should familiarise
yourself with ``.odc`` accessor which ``odc-geo`` dynamically adds to all xarray ``DataArray`` and ``Dataset``
objects.

Note that ``dc.load()`` now preferentially accepts ``odc-geo`` data types for passing ``GeoBox`` via the ``like``
parameter, as well as ``resolution`` and ``align`` values, although backwards compatible behaviour with the old
types is available with a deprecation warning.

The classes and methods in ``datacube.utils.geometry`` are still available, but raise a deprecation warning when
used. Please migrate all code to use the equivalent methods and classes in ``odc-geo``.

2. A new configuration engine has replaced the configuration engine used previously.

There are some backwards-incompatible changes as noted below, but most existing configuration files should
continue to work as previously with minimal changes.

The behaviour of the new configuration engine (and the reasoning behind the changes) is fully documented in
`ODC Enhancement Proposal 10 <https://github.com/opendatacube/datacube-core/wiki/ODC-EP-010---Replace-Configuration-Layer>`_

a. Previously multiple config files could be read and merged to generate the final effective configuration file.
From 1.9.0 only a single config file is ever read at a time. Managed instances which have previously allowed
user customisation by the user creating a minimal config file which was loaded merged on top of a default system
configuration will have to migrate to a system whereby users take a copy of the default system configuration file
and edit that copy for their needs.

b. The "user" section no longer has a special meaning, as the old special meaning is irrelevant now that config
files are not merged.

c. Previously only the INI file format was supported for configuration files. The JSON and YAML formats are now also
supported.

d. Previously configuration by Environment Variables was implemented in an inconsistent and ad hoc way that resulted
in complex interactions that were impossible to predict without intimate knowledge of the source code that
implemented it. There is now a consistent and systematic approach taken to the interaction between the
active configuration file and environment variables. Partial backwards compatibility is attempted, but
full backwards compatibility is not possible due to the ad hoc nature of the previous implementation.

The new (preferred) environment variable names are of the form ``$ODC_<env_name>_<item_name>``

e. Tighter restrictions are applied to environment names. This is required to ensure consistent interaction
between config files and environment variables. Environment names can now only contain alphanumeric characters.
(Dashes and underscores must be removed).

f. The preferred default environment name is now ``default``. It is suggested that every config file should
start with a "default" section that is an alias to an environment defined in full elsewhere in the file.

3. The index driver API has been cleaned up and simplified, facilitating easier development of new index backends.
This should be largely invisible to most users, although some more rarely used methods and/or arguments are now
deprecated. The deprecation warnings provide specific migration advice for each case.

4. A new PostGIS-based index backend is now available.

The legacy Postgres index driver will continue to be supported in 1.9, but will be dropped in ODC-2.0.

The Postgis index driver only supports EO3-compatible metadata types. Older EO-style metadata types should
be migrated to EO3 before indexing into a Postgis driver index. We will try to provide tools to assist with
this migration but they are not yet available in 1.9.0 and due to the arbitrary generality of pre-EO3 ODC
metadata, such tools may not be possible in all cases. (The legacy postgres driver will continue to support
non-EO3 metadata types until it is dropped in 2.0)

The postgis driver will support the creation of PostGIS spatial indexes for arbitrary CRSs. This will improve
efficiency and accuracy of database searches, particularly when working with data covering regions where
conversions to/from EPSG:4326 lat/long coordinates are highly non-linear (e.g. the Pacific around the
anti-meridian and the north and south polar regions).

The postgis driver uses Alembic for managing schema migrations, so future changes to the postgis database
schema will be much easier to roll out than in the past.

See below for more information about migrating to the Postgis index driver.

Note that many other libraries in the ODC ecosystem may not work well with the Postgis driver at first. As noted
above, Explorer and Datacube-OWS in particular will need extensive changes before they can be used with the new
index driver.

5. New Lineage API (Postgis driver only)

The postgis driver handles lineage very differently to the postgres driver: Lineage data is only loosely coupled
to dataset metadata and a completely new API is introduced for working with lineages. It is now possible to
store external lineage information - i.e. it is not necessary for both the source and derived dataset to exist
in the index for the lineage relationship between them to be recorded in the database and powerful new
data structures allow working with arbitrarily nested lineage trees in both the "source-wards" and
"derived-wards" directions.

A full description of the new lineage API can be found in
`ODC Enhancement Proposal 8 <https://github.com/opendatacube/datacube-core/wiki/ODC-EP-008>`_

The handling of lineage in the legacy postgres index driver has not changed - the postgres driver does NOT support
the new lineage API.

6. Support for multi-dimensional loading of hyperspectral datasets (Coming Soon)

This is a work in progress and will not be available in 1.9.0. It will appear in a later 1.9.x release.

7. The long-deprecated "ingestion" workflow and "excecutor" API have both been removed.

8. Multiple locations per dataset is now deprecated.

The New Postgis Index Driver
----------------------------

Configuration
+++++++++++++

The configuration for a postgis index looks the same as the configuration for a legacy postgres index, you simply
set the ``index_driver`` setting to ``postgis``::

[default]
alias: prod

[old]
index_driver: postgres
db_hostname: production.dbs.internal
db_database: odc_prod
db_username: odc
db_password: secret_and_secure

[new]
index_driver: postgis
db_hostname: dev.dbs.example.net
db_database: odc_dev
db_username: odc

Initialisation
++++++++++++++

You then initialise the database as previously, using ``system init`` command (-E new says to use the ``new`` environment
from the configuration file)::

datacube -E new system init

You should also create Postgis spatial indexes for any CRS you want to be able to search on (note that an EPSG:4326
spatial index is created by default). Postgis spatial indexes should be created before indexing any data where
possible. Adding a new spatial index to a populated index will be very slow. For example to create a spatial index
for EPSG:3577::

datacube -E new spindex create 3577

Migrating (Cloning) Data From a Postgres Index
++++++++++++++++++++++++++++++++++++++++++++++

To clone data from an old index to a new one (the two indexes may use different index drivers)::

datacube -E new system clone old

Note that the target index is specified with the ``-E`` flag and the source index is provided as an argument to the
``system clone`` command.

Data that cannot be directly copied is skipped, e.g.:

* Non-EO3 compatible data cannot be copied from a ``postgres`` index into a ``postgis`` index.
* External lineage information cannot be copied from a ``postgis`` index to a ``postgres`` index.

The clone command supports the following options:

* ``--skip-lineage`` If set, lineage data is not copied. Default is to NOT skip lineage (to attempt to copy lineage data)
* ``--lineage-only`` If set, ONLY lineage data is copied.
* ``--batch-size N`` Index cloning is batched for performance. This option specifies the number of records to write to
the target database at a time. Default is 1000.

Geospatial search
+++++++++++++++++

Geopolygons for spatial search can be passed to ``dc.load()``, as before::

dc.load(...., geopolygon=poly, ...)

In the postgres driver, the search is done against a bounding box around the polygon projected into EPSG:4326,
then the extents of datasets returned by the bounding box search are checked for overlap with the original
geopolygon. In the postgis driver, the polygon is passed directly to Postgis for an indexed spatial search.

* Only datasets whose extents overlap the geopolygon will be loaded.
* Geopolygons whose CRS does NOT have a native spatial index available will be projected to EPSG:4326 for search
purposes.
* Datasets whose projected extents are not contained within a given CRS's "valid area" will NOT be included in that
CRS's spatial index.
18 changes: 14 additions & 4 deletions docs/about/whats_new.rst
Expand Up @@ -8,16 +8,19 @@ What's New
v1.9.next
=========

v1.9.0-rc1 (27th March 2024)
============================

- Merge in 1.8.x branch changes. (:pull:`1459`, :pull:`1473`, :pull:`1532`, :pull:`1548`, :pull:`1565`)
- External Lineage API (:pull:`1401`)
- Add lineage support to index clone operation (:pull:`1429`)
- Migrate to SQLAlchemy 2.0 (:pull:`1432`)
- Clean up deprecated code and add deprecation warnings to legacy methods, simplify DocReader logic (:pull:`1406`)
- Mark geometry module as deprecated and replace all usage with odc-geo (:pull:`1424`)
- Mark GridSpec as deprecated, replace math and cog functions with odc-geo equivalents, enforce new odc-geo conventions (:pull:`1441`)
- Rename `gbox` to `geobox` in parameter names (:pull:`1441`)
- Rename ``gbox`` to ``geobox`` in parameter names (:pull:`1441`)
- Remove executor API (:pull:`1462`)
- Remove ingestion methods, `GridWorkflow` and `Tile` classes (:pull:`1465`)
- Remove ingestion methods, ``GridWorkflow`` and ``Tile`` classes (:pull:`1465`)
- Fix postgis queries for numeric custom search fields (:pull:`1475`)
- Document best practice for pulling in changes from develop and update constraints.txt (:pull:`1478`)
- Postgis index driver performance tuning (:pull:`1480`)
Expand All @@ -35,14 +38,17 @@ v1.9.next
- Deprecate multiple locations. (:pull:`1546`)
- Deprecate search_eager and search_summaries and add `archived` arg to all dataset search/count methods. (:pull:`1550`)
- Migrate away from deprecated Python pkg_resources module (:pull:`1558`)
- Add `custom_offsets` and `order_by` arguments to search_retunrning() - order_by still unimplemented. (:pull:`1557`)
- Add ``custom_offsets`` and ``order_by`` arguments to search_retunrning() - order_by still unimplemented. (:pull:`1557`)
- Fix and enhance typehints, automated static type checking with mypy. (:pull:`1562`)
- Improve SQLAlchemy join hints, addressing an recurring but intermittent bug. (:pull:`1564`)
- Improve typehints and update docstrings in datacube/api/core.py (:pull:`1567`)

- Add migration notes, update documentation and whats_new.rst for 1.9.0-rc1 release (:pull:`1569`)

v1.8.next
=========

v1.8.18 (27th March 2024)
=========================
- Add dataset cli tool ``find-duplicates`` to identify duplicate indexed datasets (:pull:`1517`)
- Make solar_day() timezone aware (:pull:`1521`)
- Warn if non-eo3 dataset has eo3 metadata type (:pull:`1523`)
Expand All @@ -51,10 +57,14 @@ v1.8.next
- Update github-Dockerhub credential-passing mechanism. (:pull:`1528`)
- Tweak ``list_products`` logic for getting crs and resolution values (:pull:`1535`)
- Add new ODC Cheatsheet reference doc to Data Access & Analysis documentation page (:pull:`1543`)
- Compatibility fix to allow users to supply ``odc.geo``-style GeoBoxes to ``dc.load(like=...)`` (:pull:`1551`)
- Fix broken codecov github action. (:pull:`1554`)
- Update documentation links to DEA Knowledge Hub (:pull:`1559`)
- Throw error if ``time`` dimension is provided as an int or float to Query construction
instead of assuming it to be seconds since epoch (:pull:`1561`)
- Add generic NOT operator and for ODC queries and ``Not`` type wrapper (:pull:`1563`)
- Update whats_new.rst for release (:pull:`1568`)


v1.8.17 (8th November 2023)
===========================
Expand Down

0 comments on commit 411f6c5

Please sign in to comment.