Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TileDB: add support for building overviews on rasters stored as TileDB groups #9837

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

rouault
Copy link
Member

@rouault rouault commented May 2, 2024

Supports the CREATE_GROUP=YES creation option to create TileDB raster datasets as a TileDB group, to make it cleaner to store overview within it.

See https://github.com/rouault/gdal/blob/tiledb_overview/frmts/tiledb/overview_model.rst for a description on how TileDB overviews are stored.

@normanb / @sgillies : this is mostly based on https://docs.google.com/document/d/1B-NizsoJn79W_RV3ujg6K1D5xJmWEja29Bv3323uYCU/edit#heading=h.p8z7kqfrb9st (with the removal of creating overviews on non-group dataset)

@rouault rouault added this to the 3.10.0 milestone May 2, 2024
Copy link
Contributor

@sgillies sgillies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rouault @normanb I'm new to this work, made a quick scan this afternoon. I've got a few suggestions for dataset type names and group/array layout. Norman and I will also sync on this so that we're of one mind on it 😄

as a TileDB array whose name is the base name of the group, suffixed with ``_0``

The first overview level created is a TileDB array, placed within that group,
and whose name is the base name of the group, suffixed with ``_1``. And so on.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rouault @normanb using the stem of the group name as a part of the array names makes the entire structure fragile. If a group is renamed or moved, the implicit links between group and arrays are broken.

What would you think about a generic prefix such as "pyr" or no prefix at all?

In other words, this:

    $ ls -al byte_group.tiledb
    drwxr-xr-x  6 even even  4096 mai    2 02:13 ./
    drwxrwxr-x 57 even even 86016 mai    2 01:42 ../
    drwxr-xr-x  8 even even  4096 mai    1 17:17 pyr_0/      <== Full resolution dataset
    drwxr-xr-x  8 even even  4096 mai    2 02:13 pyr_1/      <== First overview level
    drwxr-xr-x  8 even even  4096 mai    2 02:13 pyr_2/      <== Second overview level
    drwxr-xr-x  2 even even  4096 mai    2 02:13 __group/
    drwxr-xr-x  2 even even  4096 mai    2 02:13 __meta/
    -rw-r--r--  1 even even     0 mai    1 17:17 __tiledb_group.tdb

or

    $ ls -al byte_group.tiledb
    drwxr-xr-x  6 even even  4096 mai    2 02:13 ./
    drwxrwxr-x 57 even even 86016 mai    2 01:42 ../
    drwxr-xr-x  8 even even  4096 mai    1 17:17 0/      <== Full resolution dataset
    drwxr-xr-x  8 even even  4096 mai    2 02:13 1/      <== First overview level
    drwxr-xr-x  8 even even  4096 mai    2 02:13 2/      <== Second overview level
    drwxr-xr-x  2 even even  4096 mai    2 02:13 __group/
    drwxr-xr-x  2 even even  4096 mai    2 02:13 __meta/
    -rw-r--r--  1 even even     0 mai    1 17:17 __tiledb_group.tdb

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with that proposed change

as a TileDB array. When creating overviews, they will be stored in an
auxiliary TileDB group, located at the same level as the full resolution
array, with an URI which is the one of the full resolution array with an
additional ``.ovr`` suffix.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it would be handy to be able to add sidecar overviews to an existing array, I'm not sure we want people to create these just because it's possible. What do you think, @normanb ?


The group has the following TileDB metadata items:

* ``dataset_type``: set to ``raster``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rouault @normanb I would like to propose raster_pyr_v1 ("Raster Pyramid Version 1") for the dataset type value. "v1" because I think we'll want to change the group/array layout down the road, and "pyr" to indicate that this is a raster pyramid. raster could remain for single-array rasters with or without a sidecar group.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with that change, and the below one too


The group has the following TileDB metadata item:

* ``dataset_type``: set to ``raster_overviews``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe dataset_type: raster_ovr_v1 with the same naming change as I proposed for raster_pyr_v1?

@rouault
Copy link
Member Author

rouault commented May 15, 2024

@sgillies @normanb Did you sync together regarding Sean's above suggestions?

@sgillies
Copy link
Contributor

@rouault sorry for the delay! We did discuss and here are the conclusions:

  1. Overview arrays should be named l_0, l_1, etc.
  2. Dataset_type metadata should remain raster.
  3. We do not desire a sidecar .ovr group and the TileDB platform will not support it, so it would be best to delete that format variant.
  4. We'd like to see CREATE_GROUP=YES be the default. Groups will be favored by the TileDB platform. We understand that this means a group with a single array for users who don't request overviews.

rouault added 18 commits May 23, 2024 14:43
…B_ATTRIBUTE creation option, when the block size is not a multiple of the raster size
…r+b ; GTiff: turn VSIError into CPLError on creation
…removed per 130560b

Even if no longer use deprecated features, the mere fact of including
TileDB headers that declare deprecated methods cause warnings to be
emitted without the suppression.
@rouault
Copy link
Member Author

rouault commented May 23, 2024

We did discuss and here are the conclusions:

I've updated the pull request with those changes, starting at commit "Revert "TileDB raster: implement BuildOverviews() for base dataset not in group mode").
As the CREATE_GROUP creation option is common to the raster and vector sides of the driver, I found it to be perhaps simpler to change its default to YES for both (at least to be able to present a default value in the XML metadata of GDAL_DMD_CREATIONOPTIONLIST), so I also changed it for vectors. I could revert that change if needed.

@rouault rouault changed the title TileDB: add support for building overviews stored as TileDB arrays TileDB: add support for building overviews stored as TileDB groups May 23, 2024
@rouault rouault changed the title TileDB: add support for building overviews stored as TileDB groups TileDB: add support for building overviews on rasters stored as TileDB groups May 23, 2024
@coveralls
Copy link
Collaborator

coveralls commented May 23, 2024

Coverage Status

coverage: 69.133%. remained the same
when pulling 8c64193 on rouault:tiledb_overview
into 12dab86 on OSGeo:master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants