New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
autoSuggestions
option seems to pull in low-use values as well
#9227
Comments
The specific behavior of the I suspect taginfo’s 1,000-occurrence threshold is for performance reasons; it would be expensive to generate usage graphs and compute most frequent combinations for the long tail of tags, regardless of validity. Taginfo has a different threshold of 100 occurrences for considering an alphanumeric key “good”. #7174 was a similar case where some very rare values were showing up unexpectedly. There are two existing solutions to that problem: either mark the problematic values as deprecated or turn off |
Conversely, it may be hard to see if many bad tags get propagated with the limit set to 10 as well. 10 is awfully low for most tags; in my experience any tag value below 100 (excepting tags that document things like opening hours or monetary amounts which shouldn't suggest values at all anyway) should probably be ignored by tools. These are not bad values per se, but they tend to include lots of synonyms, misspellings, and unconventional notations. Could that limit perhaps be raised from 10 to 100? |
A minimum of 100 occurrences doesn’t seem like a painfully high bar to clear for keyword-like values, so I think that would be a decent thing to try. Maybe we could also exempt documented values from the minimum, like taginfo does. That would accommodate tags like You’re right that some keys with non-keyword values, like |
Yeah, that seems like it would cover most desirable values, although usually documented values exceed 100 uses anyway. Exceptions are perhaps the long-tail tags like |
Yeah, 10 seems indeed to be too low for the cut-off point. I tried to look for some valid tags which might get hidden when increasing it to 100, and didn't find many. There are some examples:
I think it is also important to keep in mind that the autocomplete/suggestion functionality is meant to be a way for users to discover the more niche, less often used tags. In a2cacaa I went with a similar metric which is already used for filtering out uncommon tag keys: by only returning values which are either used more than 100 times or have a wiki page. This still lets a few "bad" tags slip through, i.e. ones where the wiki page describes them as spelling mistakes or similar error cases of another tag value, but those cases should probably be best modeled in the |
In openstreetmap/id-tagging-schema#553 I am seeing that
autoSuggestions
set totrue
(the default if missing) is pulling in tag values from TagInfo with as little as 16 uses. Is this by design?Because anyone can create and use new tag values (and tags), the values used are not necessarily useful or correct. Misspellings and synonyms for more common tag-values are a common occurrence. To prevent propagating such tag values, editors that implement an auto suggestion feature should probably set a lower limit to the number of uses for a tag. A very safe limit is
1000
, which matches the level at which TagInfo starts keeping usage graphs etc. for that tag-value as well.Possibly, such a lower limit could be lowered for specific fields eventually, if needed.
The text was updated successfully, but these errors were encountered: