`autoSuggestions` option seems to pull in low-use values as well #9227

jdhoek · 2022-08-01T06:57:14Z

In openstreetmap/id-tagging-schema#553 I am seeing that autoSuggestions set to true (the default if missing) is pulling in tag values from TagInfo with as little as 16 uses. Is this by design?

Because anyone can create and use new tag values (and tags), the values used are not necessarily useful or correct. Misspellings and synonyms for more common tag-values are a common occurrence. To prevent propagating such tag values, editors that implement an auto suggestion feature should probably set a lower limit to the number of uses for a tag. A very safe limit is 1000, which matches the level at which TagInfo starts keeping usage graphs etc. for that tag-value as well.

Possibly, such a lower limit could be lowered for specific fields eventually, if needed.

The text was updated successfully, but these errors were encountered:

1ec5 · 2022-08-01T07:39:07Z

The specific behavior of the autoSuggestions flag is up to the client applications that implement schema-builder, such as iD. iD currently requires at least 10 occurrences of a tag coming from taginfo before it’ll show up in the list. However, there are a bunch of other heuristics in the same method, such as omitting deprecated values.

I suspect taginfo’s 1,000-occurrence threshold is for performance reasons; it would be expensive to generate usage graphs and compute most frequent combinations for the long tail of tags, regardless of validity. Taginfo has a different threshold of 100 occurrences for considering an alphanumeric key “good”.

#7174 was a similar case where some very rare values were showing up unexpectedly. There are two existing solutions to that problem: either mark the problematic values as deprecated or turn off autoSuggestions, as you’ve done in openstreetmap/id-tagging-schema#554. The advantage of this approach is that we know which values of parking:orientation are valid and which are invalid (all the rest); on the other hand, we don’t have as much visibility into which keys will be adversely affected by a much higher threshold.

jdhoek · 2022-08-01T07:58:17Z

on the other hand, we don’t have as much visibility into which keys will be adversely affected by a much higher threshold.

Conversely, it may be hard to see if many bad tags get propagated with the limit set to 10 as well. 10 is awfully low for most tags; in my experience any tag value below 100 (excepting tags that document things like opening hours or monetary amounts which shouldn't suggest values at all anyway) should probably be ignored by tools. These are not bad values per se, but they tend to include lots of synonyms, misspellings, and unconventional notations. Could that limit perhaps be raised from 10 to 100?

1ec5 · 2022-08-01T08:19:50Z

A minimum of 100 occurrences doesn’t seem like a painfully high bar to clear for keyword-like values, so I think that would be a decent thing to try. Maybe we could also exempt documented values from the minimum, like taginfo does. That would accommodate tags like marker=paddle that aren’t mere typos sneaking in under the radar. I think the filter has access to taginfo’s wiki-based description strings (though not the description strings that come from data items).

You’re right that some keys with non-keyword values, like opening_hours and maxweight, may be impacted by this change. Of course the best solution would be to have dedicated support for special syntaxes (e.g., #974 ideditor/schema-builder#15), but in the meantime, maybe a per-field property could indicate the minimum number of occurrences required for inclusion in the taginfo-powered list for that key.

jdhoek · 2022-08-01T08:27:44Z

A minimum of 100 occurrences doesn’t seem like a painfully high bar to clear for keyword-like values, so I think that would be a decent thing to try. Maybe we could also exempt documented values from the minimum, like taginfo does.

Yeah, that seems like it would cover most desirable values, although usually documented values exceed 100 uses anyway. Exceptions are perhaps the long-tail tags like shop=*.

tyrasd · 2022-08-01T17:15:30Z

100 occurrences

Yeah, 10 seems indeed to be too low for the cut-off point. I tried to look for some valid tags which might get hidden when increasing it to 100, and didn't find many. There are some examples:

traffic_calming=choked_island – 62 objects, approved status
seamark:buoy_lateral:colour=red;green;red;green 66 objects, de facto status
amenity=dog_parking – 75 objects, unspecified status

I think it is also important to keep in mind that the autocomplete/suggestion functionality is meant to be a way for users to discover the more niche, less often used tags. In a2cacaa I went with a similar metric which is already used for filtering out uncommon tag keys: by only returning values which are either used more than 100 times or have a wiki page. This still lets a few "bad" tags slip through, i.e. ones where the wiki page describes them as spelling mistakes or similar error cases of another tag value, but those cases should probably be best modeled in the deprecated tags list of the id-tagging-schema.

jdhoek added the bug A bug - let's fix this! label Aug 1, 2022

tyrasd added question Not Actionable - just a question about something moveto-iD and removed bug A bug - let's fix this! labels Aug 1, 2022

tyrasd transferred this issue from openstreetmap/id-tagging-schema Aug 1, 2022

tyrasd added enhancement An enhancement to make an existing feature even better and removed moveto-iD question Not Actionable - just a question about something labels Aug 1, 2022

tyrasd closed this as completed in a2cacaa Aug 1, 2022

tyrasd mentioned this issue Sep 27, 2022

Update to iD v2.22.0 openstreetmap/openstreetmap-website#3724

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`autoSuggestions` option seems to pull in low-use values as well #9227

`autoSuggestions` option seems to pull in low-use values as well #9227

jdhoek commented Aug 1, 2022

1ec5 commented Aug 1, 2022

jdhoek commented Aug 1, 2022

1ec5 commented Aug 1, 2022 •

edited

jdhoek commented Aug 1, 2022

tyrasd commented Aug 1, 2022 •

edited

autoSuggestions option seems to pull in low-use values as well #9227

autoSuggestions option seems to pull in low-use values as well #9227

Comments

jdhoek commented Aug 1, 2022

1ec5 commented Aug 1, 2022

jdhoek commented Aug 1, 2022

1ec5 commented Aug 1, 2022 • edited

jdhoek commented Aug 1, 2022

tyrasd commented Aug 1, 2022 • edited

`autoSuggestions` option seems to pull in low-use values as well #9227

`autoSuggestions` option seems to pull in low-use values as well #9227

1ec5 commented Aug 1, 2022 •

edited

tyrasd commented Aug 1, 2022 •

edited