c/o (care of) in addresses are identified as road #607

futurewebpn · 2022-11-10T17:50:42Z

Hi!

I was checking out libpostal, and saw something that could be improved.

My country is

Austria

Here's how I'm using libpostal

REST-API

Here's what I did

Futureweb GmbH, c/o Patrick Neuner, Innsbruckerstraße 7, 6380 St. Johann in Tirol, Österreich

Here's what I got

[{"label":"house","value":"futureweb gmbh"},{"label":"road","value":"c/o patrick neuner innsbruckerstraße"},{"label":"house_number","value":"7"},{"label":"postcode","value":"6380"},{"label":"city","value":"st. johann in tirol"},{"label":"country","value":"österreich"}]

Here's what I was expecting

c/o Patrick Neuner should be part of house and not part of road (or dedicated care of field).

For parsing issues, please answer "yes" or "no" to all that apply.

Does the input address exist in OpenStreetMap?

yes, but without c/o.

Do all the toponyms exist in OSM (city, state, region names, etc.)?

yes

If the address uses a rare/uncommon format, does changing the order of the fields yield the correct result?

no, https://de.wikipedia.org/wiki/Zustellanweisung

If the address does not contain city, region, etc., does adding those fields to the input improve the result?

no, we tried removing/adding, as soon as c/o is used, it is road.

If the address contains apartment/floor/sub-building information or uncommon formatting, does removing that help? Is there any minimum form of the address that gets the right parse?

yes, removing c/o completly and it works.

Here's what I think could be improved

Adding c/o detection.

futurewebpn · 2022-11-10T17:56:41Z

Might be it could also be with some initials that make this problem, also saw it happening with:
Futureweb GmbH, C.A. Patrick, Innsbruckerstraße 7, 6380 St. Johann in Tirol, Österreich
setting language doesn't change anything.

albarrentine · 2024-02-15T02:20:20Z

Though many people seem to use this on company/mailing addresses, etc. it's not really trained with recipient information (maybe venue/business/POI names but not as much for mailing-specific details like individual recipients, divisions/departments, directions, etc.). In particular the training addresses we have come from OpenStreetMap which are usually not attached to individual people, just the address and maybe the venue/business name. I considered generating "c/o" information for the training set but it would mean using a data set that attaches people to addresses (lots of privacy concerns with that) or generating names, which is a pretty major task and most e.g. testing libraries that do it tend to be heavily biased toward American names, etc. so would have to find some sort of wide-coverage Census data to sample names, etc. when generating.

If it's mostly well-structured/comma-separated and in the same country, splitting out the "c/o" component with a simple regex could work. Another more generic way to do this without regex would be to try splitting by comma and moving backward through the string, parse the last phrase first, then from the second-to-last to the end, then the previous one til the end, etc. and track the labels and phrases until something changes, then throw out the phrase that created the inconsistency and keep moving.

For instance:

> Österreich

Result:

{
  "country": "österreich"
}

> 6380 St. Johann in Tirol, Österreich

Result:

{
  "postcode": "6380",
  "city": "st. johann in tirol",
  "country": "österreich"
}

> Innsbruckerstraße 7, 6380 St. Johann in Tirol, Österreich

Result:

{
  "road": "innsbruckerstraße",
  "house_number": "7",
  "postcode": "6380",
  "city": "st. johann in tirol",
  "country": "österreich"
}

> C.A. Patrick, Innsbruckerstraße 7, 6380 St. Johann in Tirol, Österreich

Result:

{
  "road": "c.a. patrick innsbruckerstraße",
  "house_number": "7",
  "postcode": "6380",
  "city": "st. johann in tirol",
  "country": "österreich"
}

> FutureWeb GmbH, Innsbruckerstraße 7, 6380 St. Johann in Tirol, Österreich

Result:

{
  "house": "futureweb gmbh",
  "road": "innsbruckerstraße",
  "house_number": "7",
  "postcode": "6380",
  "city": "st. johann in tirol",
  "country": "österreich"
}

Here, once you add "C.A. Patrick", the parse stops being consistent with what it returned previously. That could be because it's actually part of the road name, but if you're sure that each comma-separated phrase should be a distinct component (or maybe commas are fine within "house" but not other places), that might be a place to throw it out and continue through the rest of the phrases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

c/o (care of) in addresses are identified as road #607

c/o (care of) in addresses are identified as road #607

futurewebpn commented Nov 10, 2022

futurewebpn commented Nov 10, 2022

albarrentine commented Feb 15, 2024

c/o (care of) in addresses are identified as road #607

c/o (care of) in addresses are identified as road #607

Comments

futurewebpn commented Nov 10, 2022

My country is

Austria

Here's how I'm using libpostal

REST-API

Here's what I did

Here's what I got

Here's what I was expecting

For parsing issues, please answer "yes" or "no" to all that apply.

yes, removing c/o completly and it works.

Here's what I think could be improved

futurewebpn commented Nov 10, 2022

albarrentine commented Feb 15, 2024