- Updated the Postgres product views to include the whole dataset metadata document.
datacube system init
now recreates the product views by default every time it is run, and now supports Postgres 9.6.- URI searches are now better supported from the cli:
datacube dataset search uri = file:///some/uri/here
datacube user
now supports a user description (via--description
) when creating a user, and delete accepts multiple user arguments.- Platform-specific (Landsat) fields have been removed from the default
eo
metadata type in order to keep it minimal. Users & products can still add their own metadata types to use additional fields.- Dataset locations can now be archived, not just deleted. This represents a location that is still accessible but is deprecated.
- We are now part of Open Data Cube, and have a new home at https://github.com/opendatacube/datacube-core
This release now enforces the uri index changes to be applied: it will prompt you to rerun init
as an administrator to update your existing cubes: datacube -v system init
(this command can be run without affecting read-only users, but will briefly pause writes)
- Added
--allow-exclusive-lock
flag to product add/update commands, allowing faster index updates when system usage can be halted.{version}
can now be used in ingester filename patterns
- Implemented improvements to dataset search and info cli outputs
- Can now specify a range of years to process to ingest cli (e.g. 2000-2005)
- Fixed metadata_type update cli not creating indexes (running system init will create missing ones)
- Enable indexing of datacube generated NetCDF files. Making it much easier to pull selected data into a private datacube index. Use by running datacube dataset add selected_netcdf.nc.
- Switch versioning system to increment the second digit instead of the third.
- Added sources-policy options to dataset add cli
- Multiple dataset search improvements related to locations
- Keep hours/minutes when grouping data by solar_day
- Code Changes: datacube.model.[CRS,BoundingBox,Coordinate,GeoBox have moved into datacube.utils.geometry. Any code using these should update their imports.
- Fixed several issues with the geometry utils
- Added more operations to the geometry utils
- Updated
recipes
to use geometry utils- Enabled Windows CI (python 3 only)
- Added update command to datacube dataset cli
- Added show command to datacube product cli
- Added list and show commands to datacube metadata_type cli
- Added 'storage unit' stacker application
- Replaced model.GeoPolygon with utils.geometry library
- Fixed a data loading issue when reading HDF4_EOS datasets.
- Added support for buffering/padding of GridWorkflow tile searches
- Improved the Query class to make filtering by a source or parent dataset easier. For example, this can be used to filter Datasets by Geometric Quality Assessment (GQA). Use source_filter when requesting data.
- Additional data preparation and configuration scripts
- Various fixes for single point values for lat, lon & time searches
- Grouping by solar day now overlays scenes in a consistent, northern scene takes precedence manner. Previously it was non-deterministic which scene/tile would be put on top.
- Added support for accessing data through http and s3 protocols
- Added dataset search command for filtering datasets (lists id, product, location)
- ingestion_bounds can again be specified in the ingester config
- Can now do range searches on non-range fields (e.g. dc.load(orbit=(20, 30))
- Merged several bug-fixes from CEOS-SEO branch
- Added Polygon Drill recipe to
recipes
- Fixed the affine deprecation warning
- Added datacube metadata_type cli tool which supports add and update
- Improved datacube product cli tool logging
- Improved ingester task throughput when using distributed executor
- Fixed an issue where loading tasks from disk would use too much memory
.model.GeoPolygon.to_crs
now adds additional points (~every 100km) to improve reprojection accuracy
- Ingester can now be configured to have WELD/MODIS style tile indexes (thanks Chris Holden)
- Added --queue-size option to datacube ingest to control number of tasks queued up for execution
- Product name is now used as primary key when adding datasets. This allows easy migration of datasets from one database to another
- Metadata type name is now used as primary key when adding products. This allows easy migration of products from one database to another
.DatasetResource.has
now takes dataset id insted of.model.Dataset
- Fixed an issues where database connections weren't recycled fast enough in some cases
- Fixed an issue where
.DatasetTypeResource.get
and.DatasetTypeResource.get_by_name
would cache None if product didn't exist
- Added origin, alignment and GeoBox-based methods to
.model.GridSpec
- Fixed satellite path/row references in the prepare scripts (Thanks to Chris Holden!)
- Added links to external datasets in
indexing
- Improved archive and restore command line features: datacube dataset archive and datacube dataset restore
- Improved application support features
- Improved system configuration documentation
.GridWorkflow.list_tiles
and.GridWorkflow.list_cells
now return a.Tile
object- Added resampling parameter to
.Datacube.load
and.GridWorkflow.load
. Will only be used if the requested data requires resampling.- Improved
.Datacube.load
like parameter behaviour. This allows passing in axarray.Dataset
to retrieve data for the same region.- Fixed an issue with passing tuples to functions in Analytics Expression Language
- Added a
user_guide
section to the documentation containing useful code snippets- Reorganized project dependencies into required packages and optional 'extras'
- Added performance dependency extras for improving run-time performance
- Added analytics dependency extras for analytics features
- Added interactive dependency extras for interactivity features
- Added bit shift and power operators to Analytics Expression Language
- Added datacube product update which can be used to update product definitions
- Fixed an issue where dataset geo-registration would be ignored in some cases
- Fixed an issue where Execution Engine was using dask arrays by default
- Fixed an issue where int8 data could not sometimes be retrieved
- Improved search and data retrieval performance
- Improved spatio-temporal search performance. datacube system init must be run to benefit
- Added info, archive and restore commands to datacube dataset
- Added product-counts command to datacube-search tool
- Made Index object thread-safe
- Multiple masking API improvements
- Improved database Index API documentation
- Improved system configuration documentation
- Updated the way database indexes are patitioned. Use datacube system init --rebuild to rebuild indexes
- Added fuse_data ingester configuration parameter to control overlaping data fusion
- Added --log-file option to datacube dataset add command for saving logs to a file
- Added index.datasets.count method returning number of datasets matching the query
- Improved dataset search performance
- Restored ability to index telemetry data
- Fixed an issue with data access API returning uninitialized memory in some cases
- Fixed an issue where dataset center_time would be calculated incorrectly
- General improvements to documentation and usablity
- Added framework for developing distributed, task-based application
- Several additional Ingester performance improvements
This release brings major performance and usability improvements
- Major performance improvements to GridWorkflow and Ingester
- Ingestion can be limited to one year at a time to limit memory usage
- Ingestion can be done in two stages (serial followed by highly parallel) by using --save-tasks/load-task options. This should help reduce idle time in distributed processing case.
- General improvements to documentation.
This release contains lots of fixes in preparation for the first large ingestion of Geoscience Australia data into a production version of AGDCv2.
- General improvements to documentation and user friendliness.
- Updated metadata in configuration files for ingested products.
- Full provenance history is saved into ingested files.
- Added software versions, machine info and other details of the ingestion run into the provenance.
- Added valid data region information into metadata for ingested data.
- Fixed bugs relating to changes in Rasterio and GDAL versions.
- Refactored
GridWorkflow
to be easier to use, and include preliminary code for saving created products.- Improvements and fixes for bit mask generation.
- Lots of other minor but important fixes throughout the codebase.
This release includes restructuring of code, APIs, tools, configurations and concepts. The result of this churn is cleaner code, faster performance and the ability to handle provenance tracking of Datasets created within the Data Cube.
The major changes include:
- The
datacube-config
anddatacube-ingest
tools have been combined intodatacube
.- Added dependency on
pandas
for nicer search results listing and handling.Indexing <indexing>
andingestion
have been split into separate steps.- Data that has been
indexed <indexing>
can be accessed without going through the ingestion process.- Data can be requested in any projection and will be dynamically reprojected if required.
- Dataset Type has been replaced by
Product <product-definitions>
- Storage Type has been removed, and an
Ingestion Configuration <ingest-config>
has taken it's place.- A new
datacube-class
for querying and accessing data.
Pre-Unification release.
Many API improvements.
This release is to support generation of GA Landsat reference data.
First working Data Cube v2 code.