Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tagging tag changes over time #488

Closed
Dimitar5555 opened this issue Nov 28, 2022 · 5 comments
Closed

Tagging tag changes over time #488

Dimitar5555 opened this issue Nov 28, 2022 · 5 comments

Comments

@Dimitar5555
Copy link

Currently adding different names of a simple polygon requires the use of multiple relations with overlapping tags which looks ugly in the editor and is hard to edit and maintain. For one such example see: https://openhistoricalmap.org/way/198851511. Adding different tags for linear features require creating a dozen duplicating ways which is even harder to maintain or requires using relations which can get messy quite quickly.

I've suggested before to use a format like this:
key=start_date_1;end_date_1;value_1^start_date_2;end_date_2;value_2

In cases where one of the dates is unknown, it can be left blank, the semicolon should be required to know which date is missing.
key=start_date_1;;value
key=;end_date_1;value

The idea is to make it as easy as possible for data consumers (and editors) to get the required data. Running value.split("^"); will give all values with their start/end dates. Running .split(";", 3) will return an array which would look like this [start_date, end_date, value]. In cases where the start or the end date is missing, the array would have an empty string. *

Note: there must be at least one "special" symbol which can't be used in any value.

Positive sides:

  • easy to use, edit and maintain (compared to using a dozen relations for a single object or duplicating it's geometry)
  • easy to parse
  • one method works almost everywhere (except for route relations which change their routes over time)
  • allows using edtf without a separate tag

Negative sides:

  • places with a lot of name (or any other tag) changes might run into the max length of values (which was 256 characters iirc)
  • Nominatim may need some rework

I would love to hear what you think about such schema and any possible issues which you might have noticed that I've missed.

*The code examples assume that the parser is using Java. That method/function may have other names in other languages and may behave differently. It is possible to write a simple method/function that does it in almost all (if not all) programming languages.

@1ec5
Copy link
Member

1ec5 commented Nov 28, 2022

I've suggested before to use a format like this:

Previously in #284 (comment), for those who weren’t following that discussion.

Off the top of my head, here are some other downsides with embedding temporal changes inline inside a tag value:

  • A feature needs to be duplicated anyways if the geometry ever changes at all. Inexperienced mappers may find it counterintuitive that a feature needs to be duplicated due to some changes but not due to other changes.
  • We’d need to rewrite much of iD’s UI to accommodate changes over time in any field that isn’t a freeform text field. This is even before considering preset-level changes, like place=village becoming place=town and eventually place=city.
  • At some stage when generating vector tiles, we’d need to create a separate feature for each change.
    • Alternatively, we could push the parsing down to the client side. The MVT specification lacks arrays or objects within feature properties, so only a flat string representation (same as in OHM) would be possible. Any savings in vector tile sizes could be offset by the fact that everything has to be a string; no integers would be possible. Every layer of the style would need to account for possible temporal changes in filters and style properties.
  • The = filter in OverpassQL would become much less useful. It would only remain useful for checking whether an attribute has remained constant throughout a feature’s lifetime. (With multiple features, a chronology relation can help answer this question.)
  • The proposed syntax makes it much easier for a tag value to exceed 255 characters.
  • The proposed parsing rule needs to account for individual values that contain semicolons. There’s an obscure ;; syntax for escaping semicolons in multivalue lists.

There may be other issues that wouldn’t be apparent unless we start implementing an approach along these lines. But I think the prospect of having to rewrite large parts of iD undermines the justification for this proposed change, which seems to be focused on the difficulty of selecting overlapping objects. I think it would be much more straightforward to improve the usability of overlapping objects within iD and JOSM. That would benefit OSM as well as OpenHistoricalMap.

@Dimitar5555
Copy link
Author

A feature needs to be duplicated anyways if the geometry ever changes at all. Inexperienced mappers may find it counterintuitive that a feature needs to be duplicated due to some changes but not due to other changes.

By duplication I meant using the same nodes (i.e. having two or more lines which share the same nodes)

We’d need to rewrite much of iD’s UI to accommodate changes over time in any field that isn’t a freeform text field. This is even before considering preset-level changes, like place=village becoming place=town and eventually place=city.

In theory it could be done by creating a new field type (although it may be harder to code than expected).

The = filter in OverpassQL would become much less useful. It would only remain useful for checking whether an attribute has remained constant throughout a feature’s lifetime. (With multiple features, a chronology relation can help answer this question.)

There is ~ for that purpose. Further filtering will be required if the data user wants data from specific period.

The proposed syntax makes it much easier for a tag value to exceed 255 characters.

It's already noted in the first comment. A possible workaround would be to use key_1, key_2 etc. If key has length of 255 characters, the software should look for such key. It's not the cleanest solution and it will have a few problems (like how do you decide when to start using a new key) but it should work unless you are looking for a specific value which is in key_1.

Another workaround is to have all values in the main tag and have a separate tag for the dates (this also partly solves the previous issue). For example
name=name1;name2;name3;name4
name:start_dates=1785;1790;1850;1944
name:end_dates=1790;1850;1944;

The proposed parsing rule needs to account for individual values that contain semicolons. There’s an obscure ;; syntax for escaping semicolons in multivalue lists.

For context, it is possible to split a string by specified character and have a limit on the number of resulting strings (or a limit on the number of splits. The specific implementation depends on the language). That way one string can be split only two times and everything which is after the third semicolon (the value of the key) will remain as it is regardless of how many semicolons it has therefore making escaping semicolons redundant.

There may be other issues that wouldn’t be apparent unless we start implementing an approach along these lines. But I think the prospect of having to rewrite large parts of iD undermines the justification for this proposed change, which seems to be focused on the difficulty of selecting overlapping objects. I think it would be much more straightforward to improve the usability of overlapping objects within iD and JOSM. That would benefit OSM as well as OpenHistoricalMap.

There are always underwater stones which you can't see but you will definitely hit while swimming. The goal of this issue is to create a reasonable solution before the database gets a few million dates, a few hundred duplicated ways and becomes hard to migrate to a new format.

@1ec5
Copy link
Member

1ec5 commented Nov 28, 2022

We’d need to rewrite much of iD’s UI to accommodate changes over time in any field that isn’t a freeform text field. This is even before considering preset-level changes, like place=village becoming place=town and eventually place=city.

In theory it could be done by creating a new field type (although it may be harder to code than expected).

Realistically speaking, iD isn’t going to be able to support this format for the foreseeable future, unless someone steps up to implement it. I’ve implemented several complex field types myself, but looking at the history of issues like openstreetmap/iD#974 and openstreetmap/iD#6168, I’m not optimistic about being able to write a time-qualified, multivalue field variation of every existing field type. I think there would be a similar level of effort even with the older, simpler proposal for putting date ranges in subkeys.

@batpad
Copy link

batpad commented Dec 19, 2022

cc @rwelty1889 since he has done a LOT of thinking about this problem. I think the best issue for this discussion is still likely #284 - if there are no objections, I would like to close this issue in favour of that to keep discussions around this topic in one place.

Broadly, I do agree that we should find a better solution than "redraw the feature" for every tag change. But there are many complexities: from iD, to the vector tile renderer, to the frontend data filtering logic, to deal with. I think we've made good progress in thinking about mapping these with relations over in #284 and we should continue discussion there.

@Dimitar5555
Copy link
Author

Closing this issue in favour of #284.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants