Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing elements in address line while importing #609

Open
rocainunwired opened this issue Oct 25, 2021 · 5 comments
Open

Missing elements in address line while importing #609

rocainunwired opened this issue Oct 25, 2021 · 5 comments

Comments

@rocainunwired
Copy link

place_addressline table in Nominatim DB does not have the entire address line for a few addresses. This causes missing address elements while importing data into photon. (Unsure if this is a bug on Nominatim's side)

Could we instead use get_addressdata DB function used by Nominatim DB to fetch the address line?

Ex. 17.42067, 78.40256
https://photon.komoot.io/reverse?lat=17.42067&lon=78.40256
vs
https://nominatim.openstreetmap.org/reverse.php?lat=17.42067&lon=78.40256&zoom=18&format=json

You can notice the missing address elements like "Hyderabad" (city), "Jubilee Hills" (suburb)

@lonvia
Copy link
Collaborator

lonvia commented Oct 25, 2021

You can't compare photon.komoot.io and nominatim.openstreetmap.org. They are based on different database states (and right now also completely different versions of the software).

If you think photon is missing something, we'd need a reproducable example where the Photon database is derived directly from the Nominatim database you compare to.

@rocainunwired
Copy link
Author

For the road https://www.openstreetmap.org/way/114104570,
I did the following query to my self-hosted nominatim DB to obtain the place_id (122309717)

SELECT place_id FROM placex WHERE osm_id=114104570 AND osm_type='W';

By using the place_id from the above query: I believe while importing, photon uses the below PSQL query to obtain the hierarchy list
SELECT p.name, p.class, p.type, p.rank_address FROM placex p, place_addressline pa WHERE p.place_id = pa.address_place_id and pa.place_id = 122309717 and pa.cached_rank_address > 4 and pa.address_place_id != 122309717 and pa.isaddress ORDER BY rank_address desc, fromarea desc, distance asc, rank_search desc;

which results in

                  name                        |  class   |      type      | rank_address 
-----------------------------------------------------------------------------------------
 "name"=>"Prashasan Nagar"                    | boundary | administrative |           22
 "name"=>"Serilingampalle mandal"             | boundary | administrative |           12
 "name"=>"Rangareddy", (redacted)             | boundary | administrative |           10
 "ref"=>"TG", "name"=>"Telangana", (redacted) | boundary | administrative |            8
(4 rows)

I could notice information like city and suburb are missing from this result set.
Due to the missing elements in the place_addressline table in Nominatim DB, photon itself is missing out on few elements in the entire list of hierarchy while importing the data into its DB, and hence returns lesser information

A possible suggestion that I tried is to replace the above SQL query with

SELECT p.name, p.class, p.type, p.rank_address FROM get_addressdata(122309717, -1) p WHERE rank_address > 4 ORDER BY rank_address DESC, isaddress DESC;

which results in

                  name                                          |  class   |      type      | rank_address 
-----------------------------------------------------------------------------------------------------------
 "name"=>"Road No 8"                                            | highway  | residential    |           26
 "name"=>"Prashasan Nagar"                                      | boundary | administrative |           22
 "name"=>"Ward 95 Jubilee Hills"                                | boundary | administrative |           20
 "name"=>"Greater Hyderabad Municipal Corporation Central Zone" | boundary | administrative |           18
 "name"=>"Hyderabad", (redacted)                                | boundary | administrative |           16
 "name"=>"Serilingampalle mandal"                               | boundary | administrative |           12
 "name"=>"Rangareddy",  (redacted)                              | boundary | administrative |           10
 "ref"=>"TG", "name"=>"Telangana", (redacted)                   | boundary | administrative |            8
 "ref"=>"500110"                                                | place    | postcode       |            5
(9 rows)

This result set has the full hierarchy.

I would be happy to send a PR with these changes, if this is considered a possible fix for the missing elements in hierarchy

@lonvia
Copy link
Collaborator

lonvia commented Oct 25, 2021

There is a mapping error her that needs to be fixed in OSM. Serilingampalle mandal has admin_level=6 even though it is clearly a part of Hyderabad. Nominatim detects that the hierarchy is wrong and marks Hyderabad, the Greater Hyderabad Municipal Corporation Central Zone and Ward 95 Jubilee Hills as "is not an address". Fix the admin hierarchy of Hyderabad and the problem will simply go away.

The interesting thing is that Nomiantim itself started to ignore that 'is not address' mark as an unintended side-effect of taking addr: tags into account. I need to look into that more but it is most likely a bug that needs to be fixed.

Photon could be a bit more clever in ordering the address parts these days, similar to what Nominatim's query does here but we certainly wouldn't want to call get_addressdata. This is a very expensive function which will do a lot of extra queries that are never needed with Photon.

@rocainunwired
Copy link
Author

Thanks for pointing out on the data issue. I have made changes to the relation.

The interesting thing is that Nomiantim itself started to ignore that 'is not address' mark as an unintended side-effect of taking addr: tags into account.

Could you help in any possible way on how to identify other such data points like these? This would help to investigate on the prevalence of such issues around the world

it is most likely a bug that needs to be fixed

I remember photon used to give the full information for these address on older imports, so I checked the place_addressline table in an older Nominatim server (commit ID acda4344) and it does have all the information which are missing in the newer nominatim server (commit ID b51efd87)

I also ran an import for a smaller country using the get_addressdata() vs using the place_addressline table, and it looks like it is just taking 2x the import time when I used get_addressdata()

@lonvia
Copy link
Collaborator

lonvia commented Oct 27, 2021

Lets be very clear: we are not going to double the time to create a photon database to work around a data issue in India. This is just wrong.

Could you help in any possible way on how to identify other such data points like these? This would help to investigate on the prevalence of such issues around the world

The simple assumption for all areas of class=boundary and type=administrative is: if boundaryA is completely contained in boundaryB then admin_level of boundaryA >= boundaryB. You should be able to find all data that violates this assumption with a simple Postgis query.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants