Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ID in building tags overwriting way ID #233

Open
AnBowell opened this issue Apr 9, 2024 · 1 comment
Open

ID in building tags overwriting way ID #233

AnBowell opened this issue Apr 9, 2024 · 1 comment

Comments

@AnBowell
Copy link

AnBowell commented Apr 9, 2024

When using get_buildings(), the resulting id column (usually containing the way id etc) is overwritten with the tag id when it's present.

For example in Dorset, England there's a few cases where buildings have been tagged with id: 123.

image

When you then load this data in using get_buildings() the way ID is overwritten with the tag

from pyrosm import OSM
FILEPATH = "../data/raw/dorset-latest.osm.pbf"
osm = OSM(FILEPATH)
buildings = osm.get_buildings()

print(buildings[buildings["id"] == 123].head())

Output

       start_date wikipedia   id   timestamp version  \
181257       None      None  123  1703193031       1   
187693       None      None  123  1704833732       1   
193309       None      None  123  1708889862       1   

                                                 geometry  tags osm_type  \
181257  POLYGON ((-2.47921 50.62591, -2.47916 50.62579...  None      way   
187693  POLYGON ((-2.46822 50.66240, -2.46839 50.66240...  None      way   
193309  POLYGON ((-2.47700 50.62089, -2.47700 50.62078...  None      way   

As you can see, people have tagged the buildings with duplicate IDs and these have made their way into the dataframe.

I can see that keeping an id tag was an intentional choice made in the get_osm_ways_and_relations function of data_manager.pyx: https://github.com/HTenkanen/pyrosm/blob/66de74bd0496d1148618842cac58923bf22d97ea/pyrosm/data_manager.pyx#L104C1-L107C63.

I was wondering whether this was the expected behaviour? As this makes it challenging to guarantee the ID is unique and from the correct OSM source.

Environment:

  • OS: Windows 10
  • Python package source: PyPi, pyrosm==0.6.2
  • Python v3.11.0
@AnBowell
Copy link
Author

AnBowell commented Apr 9, 2024

After taking a deeper dive, I realise that the section of code I was referring to here, in the original issue, is executed way after the ID problem arises.

I've gotten around it by adapting explode_way_tags in tagparser.pyx to prepend tagged to id in the tags. This prevents overwriting the original way ID with an ID from the tags.

cdef explode_way_tags(ways):
    exploded = []
    cdef int i, n=len(ways)
    way_keys = {}
    for i in range(0, n):
        way = ways[i]
        for k, v in way['tags'].items():
            if k == "id":
                way["tagged_id"] = v
            else:
                way[k] = v
            try:
                dummy = way_keys[k]
            except:
                way_keys[k] = None
        del way['tags']
        exploded.append(way)
    return exploded

If there's no other immediate issues with this solution, I'll open up a pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant