Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal or duplicative metadata #2

Open
smit1678 opened this issue May 13, 2015 · 5 comments
Open

Minimal or duplicative metadata #2

smit1678 opened this issue May 13, 2015 · 5 comments

Comments

@smit1678
Copy link
Member

From #1, pulling out the main question regarding having duplicative metadata. Question at hand seems to be focused on two options:

  1. Have minimal information in a metadata file. Generate additional information on the fly when indexed by a catalog -- like UUID, organization or dataset metadata, or other information that can be read by GDAL.
  2. Have duplicative information exist in the metadata file -- URI of the image is captured in the metadata file, provider and contact information, as well as bounding box and footprint.

There are pros and cons to each option. In terms of goals, it seems that the main goals for metadata should be to:

  • Ensure that an image and its information exist in a valid format for indexing
  • Provide easy method for catalogs and services to process the json
  • Be agnostic to use cases, simple, and human readable
  • Easily maintainable

To help be easily maintainable, one approach we've taken during initial testing and development has been to work on scripts to help automate the metadata file generation: https://github.com/openimagerynetwork/oin-meta-generator, to be later packaged into a command line tool. Using this utility, it would be easy for a provider to create or update thousands of metadata json files.

@kamicut @scisco and I recommend going with Option 2 as the first version of OIN. Some of the conversation was captured in Gitter with @lossyrob.

@lossyrob @warmerdam @cholmes @wonderchook @cgiovando Want to open it up to the group to make sure we're thinking through all the options and get additional input.

cc @scisco @kamicut

@wonderchook
Copy link

I agree on going with option 2 for the reasons outlined above.

@smathermather
Copy link
Contributor

Agree with @wonderchook / @smit1678. Being explicit and human readable ensures that, without special tooling beyond cat/more/less/text-editor, basic QC work can be performed.

One additional thought is re: GeoTIFF tagging, etc. -- the JSON metadata should be considered the authoritative metadata, so that there's effectively no multiple storage locations considered. e.g. If GeoTIFF says my projection is EPSG:3753 but my JSON specifies EPSG:3734, the JSON wins.

@lossyrob
Copy link
Member

I've made the case for minimizing duplication, but I see the case for JSON metadata being the authoritative metadata for the various points people raise. If that's the consensus then I'm happy to go along with it.

@cholmes
Copy link
Contributor

cholmes commented May 29, 2015

Sorry for the slow response on this, I've been on vacation and on the road. My gut is to minimize duplication, so there is less potential for weird errors. I agree with @smathermather that if we do go duplicative it makes sense for the json to win. But it feels weird to me that you'd get a GeoTiff that says the wrong thing. The geotiff is what you'd actually use, and would tend to trust what it says over some json side car.

It'd be weird on implementation - like if we made a gdal driver (which I think is key to OIN success) then it would have to especially override the geotiff in favor of the json.

I think I am leaning a bit towards having metadata that is naturally in geotiff just stay in the geotiff, instead of having weird overrides.

But need to digest more, there is a good case for having more. And in either case good tooling will be essential.

@smathermather
Copy link
Contributor

Tooling is critical, and I follow and agree with @cholmes, but I have some concerns about having some metadata in GeoTiff and some in JSON. I feel it should either all be embedded in the header, or the header should be ignored and it's all in the JSON, or the tooling should update from one to another.

TBH, this is an aesthetic thing for me -- if one part is human readable, the whole thing should be (and vice versa).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants