New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
c/o (care of) in addresses are identified as road #607
Comments
Might be it could also be with some initials that make this problem, also saw it happening with: |
Though many people seem to use this on company/mailing addresses, etc. it's not really trained with recipient information (maybe venue/business/POI names but not as much for mailing-specific details like individual recipients, divisions/departments, directions, etc.). In particular the training addresses we have come from OpenStreetMap which are usually not attached to individual people, just the address and maybe the venue/business name. I considered generating "c/o" information for the training set but it would mean using a data set that attaches people to addresses (lots of privacy concerns with that) or generating names, which is a pretty major task and most e.g. testing libraries that do it tend to be heavily biased toward American names, etc. so would have to find some sort of wide-coverage Census data to sample names, etc. when generating. If it's mostly well-structured/comma-separated and in the same country, splitting out the "c/o" component with a simple regex could work. Another more generic way to do this without regex would be to try splitting by comma and moving backward through the string, parse the last phrase first, then from the second-to-last to the end, then the previous one til the end, etc. and track the labels and phrases until something changes, then throw out the phrase that created the inconsistency and keep moving. For instance:
Here, once you add "C.A. Patrick", the parse stops being consistent with what it returned previously. That could be because it's actually part of the road name, but if you're sure that each comma-separated phrase should be a distinct component (or maybe commas are fine within "house" but not other places), that might be a place to throw it out and continue through the rest of the phrases. |
Hi!
I was checking out libpostal, and saw something that could be improved.
My country is
Austria
Here's how I'm using libpostal
REST-API
Here's what I did
Futureweb GmbH, c/o Patrick Neuner, Innsbruckerstraße 7, 6380 St. Johann in Tirol, Österreich
Here's what I got
[{"label":"house","value":"futureweb gmbh"},{"label":"road","value":"c/o patrick neuner innsbruckerstraße"},{"label":"house_number","value":"7"},{"label":"postcode","value":"6380"},{"label":"city","value":"st. johann in tirol"},{"label":"country","value":"österreich"}]
Here's what I was expecting
c/o Patrick Neuner should be part of house and not part of road (or dedicated care of field).
For parsing issues, please answer "yes" or "no" to all that apply.
yes, but without c/o.
yes
no, https://de.wikipedia.org/wiki/Zustellanweisung
no, we tried removing/adding, as soon as c/o is used, it is road.
yes, removing c/o completly and it works.
Here's what I think could be improved
Adding c/o detection.
The text was updated successfully, but these errors were encountered: