Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support of querying based on meta data #78

Open
aighes opened this issue Dec 24, 2022 · 1 comment
Open

Support of querying based on meta data #78

aighes opened this issue Dec 24, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@aighes
Copy link

aighes commented Dec 24, 2022

Couldn't find anything regarding whether the meta data like user name or creation date of nodes/ways/relations can be queried or whether it's planned at all in any future version?

It might be quite useful to filter for own data or stuff like this.

@clarisma clarisma added the enhancement New feature or request label Jan 6, 2023
@clarisma
Copy link
Owner

clarisma commented Jan 6, 2023

Thanks for opening this issue!

GOLs currently do not store metadata of features. Metadata support is not on our road map as of now, but we may consider adding it if there is sufficient demand.

Below are some thoughts on use cases and technical aspects:

Metadata in OSM

What is stored in OSM-PBF:

  • changeset ID
  • user name/ID (name can change; ID is fixed)
  • object version
  • timestamp

(Separate changeset files track other metadata related to the entire changeset: sources, tags, QA flags, etc.)

Potential use cases

  • Contributor statistics
  • Quality assurance
    • For low-quality edits or vandalism, find edits by the same user
    • How recent is the data? (A feature that has not been touched in many years may no longer be present)
  • Editing

Typical queries

In addition to current GOQL syntax, would likely need to support these queries:

  • by user name/ID
  • by date/time range
    • last edited prior to certin date
    • edited within a particular year
  • by changeset ID?

Possible ways to store metadata in a GOL

  • Full support: Metadata becomes a "first class" data element within a GOL

  • Partial support via synthetic tags

  • Not stored within GOL itself, but in an auxiliary file

Metadata as "first class" data

Advantages:

  • More concise data storage, especially we can simply store a reference to a changeset for each feature (rather than storing changeset ID, timestamp and user name/ID individually)

Drawbacks:

  • Requires changes to GOL file-format
  • Requires separate query logic

Notes:

  • If changesets are atomic, we would only need to track the changeset of a feature
  • ChangeSet could then be a separate data type within a GOL
  • version number is independent of changeset and would need to be tracked as a separate property

Generate synthetic tags based on metadata

We could turn metadata into tags when a GOL is built.

Advantages:

  • No need to modify the GOL file-format
  • Easier implementation

Drawbacks:

  • Increased file size: Storing metadata as tags is significantly more verbose than using dedicated data structures. It also reduces the opportunity to deduplicate tag-tables as the addition of metadata would make tag-tables more unique.

Related: Proposed Tag Transformations (clarisma/gol-tool#85)

Changes to Query Engine

  • The GeoDesk Query Engine currently recognizes only two types of data: text and numbers. To allow meaningful queries of timestamp ranges, need to add support for time/date data type.

Open issues

  • Are changesets atomic?
  • Can a changeset have more than one timestamp?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants