Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Check Improvement] MixedCaseNameCheck to support MixedCase Brands #540

Open
vladlemberg opened this issue Mar 30, 2021 · 13 comments
Open
Assignees
Labels

Comments

@vladlemberg
Copy link
Collaborator

vladlemberg commented Mar 30, 2021

Describe the bug

  1. MixedCase Brands are not supported and being flagged.
    Example:
    FedEx, RadioShack, JCPenney etc..

  2. Spanish and Italian "prepositions" are not supported.
    Example:
    del, de la.

To Reproduce
Run check against US data. Following names are being flagged:
https://www.openstreetmap.org/node/5269224921
https://www.openstreetmap.org/node/4354550152
https://www.openstreetmap.org/way/97720765
etc..

Expected behavior
do not flag legit mixed case brand names. and handle Spanish and Italian prepositions.

@vladlemberg
Copy link
Collaborator Author

vladlemberg commented Mar 30, 2021

@Bentleysb, @sayas01, @atiannicelli, please take a look.
I already added Spanish "prepositions" handling to the Check. My question is how to handle MixedCase brand names ?
option 1: check the Title case only instead of every character.
option2: create a side file with MixedCase brand names.

@vladlemberg
Copy link
Collaborator Author

vladlemberg commented Mar 30, 2021

another examples where lowercase part of the name does not belong to the actual name:
Manhattan Bridge lower level
Metromover entry
Aviation Circle - departures
Main Trail (red)
201 Spring Street building loading dock
Queen Anne Avenue N crossing at Lee Street

making them an upper case might confuse users.

@vladlemberg
Copy link
Collaborator Author

@Bentleysb, do we have false positive rate for this Check ?

@Bentleysb
Copy link
Collaborator

@vladlemberg, this check has always had problems with brand names. It has one of the higher FP rates at 17%, but much of that is skewed by a few countries.

I don't think we want to totally drop checking for mixed case in the middle of words, but we may want to do so in certain cases. Perhaps we could just check for title casing on features tagged as shops or amenities?

As for the non-brand name cases you mention ('Manhattan Bridge lower level', etc.) I could suggest that is exactly the kind of thing this check is looking for. If those are truly the names of those features then they should be title cased, but I suspect most should just be moved to the description or other tag.

@vladlemberg
Copy link
Collaborator Author

vladlemberg commented Mar 31, 2021

@Bentleysb, thanks for the respond. few more questions:

  1. do we want to tight the check logic first then implement AutoFix or vise versa ?
    ===
  2. "-n-" cases should be flagged or not ? Kiss-n-Ride, Kabob-n-Curry, Comics-n-Stuff, Check-n-Go
  3. cases with lowercase "i" followed be the Capital letter should be flagged or not ? iPhone, iSomething

@vladlemberg
Copy link
Collaborator Author

vladlemberg commented Mar 31, 2021

--Perhaps we could just check for title casing on features tagged as shops or amenities?
good idea. I'm creating mapping rules.

@Bentleysb
Copy link
Collaborator

@vladlemberg, It sounds like the check logic will need to be done first, or simultaneously to the autofix implementation.

The -n- and icases are all brand names that are capatalized like that. So those should not be flagged.

@vladlemberg
Copy link
Collaborator Author

@Bentleysb, could you assign this task to me ?

@vladlemberg
Copy link
Collaborator Author

@Bentleysb, to summarize:

  1. We don't need AutoFix for this Check due to complexity of input data.
  2. Adding Spanish prepositions is done. Not sure about French and Dutch. What do you think ?
    Example: du, van der, etc..
    https://www.openstreetmap.org/way/5995653
    https://www.openstreetmap.org/way/4476760
  3. CamelCase Brandnames. This is the most complicated. I removed all objects with following tags from logic.
      && !Validators.hasValuesFor(object, BrandTag.class)
      && !Validators.hasValuesFor(object, ShopTag.class)
      && !Validators.hasValuesFor(object, AmenityTag.class)
      && !Validators.hasValuesFor(object, LeisureTag.class)

It handled most common BrandNames. However, there are lots of legit BrandNames are still being flagged. I don't have a concrete solution here.

@Bentleysb
Copy link
Collaborator

@vladlemberg, that all looks good to me. As for adding French and Dutch prepositions, you can add that now if you like, but it is not essential for this update. Eventually we will need all languages if we want this to work everywhere, but that is expected to happen slowly.

@vladlemberg
Copy link
Collaborator Author

@Bentleysb, thanks for the clarification. One more question: I added "Rock-n-Roll" and "Rock 'n Roll" handling, however it brakes "Test-TesT" unit case, because I removed hyphen as a delimiter. What's your opinion ?

@vladlemberg
Copy link
Collaborator Author

@Bentleysb, sorry it brakes validNamePointHyphenTest: "Test-Test"

@vladlemberg
Copy link
Collaborator Author

or I can change pattern: ("^\\p{Lu}.*-n-\\p{Lu}.*$") -> ("^n$|^'n$")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants