Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading PBF from osmfilter -> osmconvert fails #176

Open
arredond opened this issue Mar 28, 2022 · 1 comment
Open

Reading PBF from osmfilter -> osmconvert fails #176

arredond opened this issue Mar 28, 2022 · 1 comment

Comments

@arredond
Copy link

arredond commented Mar 28, 2022

I'm working with very large PBFs (whole Planet.osm dumps) and want fetch all objects with certain tags, such as all airports (aeroway=aerodrome), ports (harbour=yes), etc.

I've seen that it's much faster to use a combination of popular OSM tools for preprocessing than to feed the whole file into pyrosm. For instance, extracting all airports from the latest Geofabrik extract for Belgium takes 4 seconds that way versus 4 minutes.

Directly using pyrosm

%%time
from pyrosm import OSM

osm = OSM('belgium-latest.osm.pbf')

osm.get_pois({'aeroway': ['aerodrome']})

# CPU times: user 1min 34s, sys: 1min 12s, total: 2min 46s
# Wall time: 3min 58s

Preprocessing with osmconvert and osmfilter. extract_airports is simply a wrapper function for filtering an .o5m file with osmfilter and then converting back to PBF with osmconvert:

%%time
extract_airports(o5m_file, 'belgium_airports.gpkg')

# Filtering tags...
# CompletedProcess(args='osmfilter belgium-latest.o5m --keep="aeroway=aerodrome" --drop-version -o=belgium-latest_filtered.o5m', returncode=0, stdout=b'', stderr=b'')
# Converting back to PBF...
# CompletedProcess(args=['osmconvert', 'belgium-latest_filtered.o5m', '-o=belgium-latest_filtered.osm.pbf'], returncode=0, stdout=b'', stderr=b'')
# Reading into GeoDataFrame with pyrosm...
# CPU times: user 37.5 ms, sys: 24.2 ms, total: 61.7 ms
# Wall time: 4.47 s

However, when filtering less common tags like harbour=yes, reading the preprocessed PBF fails. Here belgium-latest_filtered.osm.pbf is the result of the above filtering and conversion:

%%time
from pyrosm import OSM

osm = OSM('belgium-latest_filtered.osm.pbf')

osm.get_pois({'harbour': ['yes']})

# Returns a KeyError for missing `tags` 
Full Traceback --------------------------------------------------------------------------- KeyError Traceback (most recent call last) Input In [8], in () ----> 1 osm.get_data_by_custom_criteria({'harbour': True})

File /opt/homebrew/lib/python3.9/site-packages/pyrosm/pyrosm.py:689, in OSM.get_data_by_custom_criteria(self, custom_filter, osm_keys_to_keep, filter_type, tags_as_columns, keep_nodes, keep_ways, keep_relations, extra_attributes)
686 if isinstance(self._nodes, list):
687 self._nodes = concatenate_dicts_of_arrays(self._nodes)
--> 689 gdf = get_user_defined_data(
690 self._nodes,
691 self._node_coordinates,
692 self._way_records,
693 self._relations,
694 tags_as_columns,
695 custom_filter,
696 osm_keys_to_keep,
697 filter_type,
698 keep_nodes,
699 keep_ways,
700 keep_relations,
701 self.bounding_box,
702 )
704 # Do not keep node information unless specifically asked for
705 # (they are in a list, and can cause issues when saving the files)
706 if not self.keep_node_info and gdf is not None:

File /opt/homebrew/lib/python3.9/site-packages/pyrosm/user_defined.py:37, in get_user_defined_data(nodes, node_coordinates, way_records, relations, tags_as_columns, custom_filter, osm_keys, filter_type, keep_nodes, keep_ways, keep_relations, bounding_box)
34 relations = None
36 # Call signature for fetching POIs
---> 37 nodes, ways, relation_ways, relations = get_osm_data(
38 node_arrays=nodes,
39 way_records=way_records,
40 relations=relations,
41 tags_as_columns=tags_as_columns,
42 data_filter=custom_filter,
43 filter_type=filter_type,
44 osm_keys=osm_keys,
45 )
47 # If there weren't any data, return empty GeoDataFrame
48 if nodes is None and ways is None and relations is None:

File /opt/homebrew/lib/python3.9/site-packages/pyrosm/data_manager.pyx:177, in pyrosm.data_manager.get_osm_data()

File /opt/homebrew/lib/python3.9/site-packages/pyrosm/data_manager.pyx:178, in pyrosm.data_manager.get_osm_data()

File /opt/homebrew/lib/python3.9/site-packages/pyrosm/data_manager.pyx:171, in pyrosm.data_manager._get_osm_data()

File /opt/homebrew/lib/python3.9/site-packages/pyrosm/data_manager.pyx:151, in pyrosm.data_manager.get_osm_nodes()

File /opt/homebrew/lib/python3.9/site-packages/pyrosm/data_filter.pyx:282, in pyrosm.data_filter.filter_node_indices()

KeyError: 'tags'

It seems the error comes from the get_osm_nodes CPython function because there is no tags key.

However, I can't seem to find anything wrong with the PBF file generated from osmconvert, and the contents seem alright:

osmconvert belgium-latest_filtered.o5m --csv="@id @lon @lat amenity shop name" --csv-headline -o=belgium_harbours.csv

head belgium_harbours.csv                                                                    ✔ 
@id     @lon    @lat    amenity shop    name
22433531        4.3943736       51.2296887
22433539        4.3944307       51.2298171
60261479        4.4085424       51.2287714                      Jachthaven Willemdok
96946445        2.9311625       51.2241274
96946447        2.9330668       51.2229879
96946463        2.9384831       51.2191516
96946465        2.9386072       51.2193169
96946467        2.9387759       51.2195599
96946470        2.9388064       51.2196844

Any ideas on what might be going wrong? Attached is a problematic PBF, filtered to include only objects with the harbour=yes tag.

@jaguardo
Copy link

jaguardo commented Sep 3, 2022

I have a similar issue, after converting (or actually down sizing pbf) I am unable to use them within pyrosm.

conversion:

c:\>osmconvert64-0.8.8p.exe "us-midwest-latest.osm.pbf" -b=41,-85,42,-84 --complete-ways --out-pbf -o=41_85_42_84.osm.pbf

and then try and use it within pyrosm

osm = OSM("41_85_42_84.osm.pbf")
drive_net = osm.get_network(network_type="driving+service") 
drive_net.plot(figsize=(20,20))

I get the following error, obviously I need to dive into the converted file more...

ValueError                                Traceback (most recent call last)
Input In [27], in <cell line: 2>()
----> 1 drive_net = osm.get_network(network_type="driving+service") 
      2 drive_net.plot(figsize=(20,20))

File ~\pyrosm\pyrosm.py:202, in OSM.get_network(self, network_type, extra_attributes, nodes)
    199     tags_as_columns += extra_attributes
    201 if self._nodes is None or self._way_records is None:
--> 202     self._read_pbf()
    204 # Filter network data with given filter
    205 edges, node_gdf = get_network_data(
    206     self._node_coordinates,
    207     self._way_records,
   (...)
    211     slice_to_segments=nodes,
    212 )

File ~\pyrosm\pyrosm.py:121, in OSM._read_pbf(self)
    118 self._all_way_tags = way_tags
    120 # Prepare node coordinates lookup table
--> 121 self._node_coordinates = create_node_coordinates_lookup(self._nodes)

File ~\pyrosm\geometry.pyx:285, in pyrosm.geometry.create_node_coordinates_lookup()

File ~\pyrosm\geometry.pyx:286, in pyrosm.geometry.create_node_coordinates_lookup()

File ~\pyrosm\geometry.pyx:64, in pyrosm.geometry._create_node_coordinates_lookup()

File <__array_function__ internals>:180, in concatenate(*args, **kwargs)
ValueError: need at least one array to concatenate

Also tried it with nodes=True

pyrosm\pyrosm.py:205: UserWarning: Could not find any edges for given area.
  edges, node_gdf = get_network_data(

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [28], in <cell line: 4>()
      1 drive_net = osm.get_network(network_type="driving+service", nodes=True) 
----> 2 drive_net.plot(figsize=(20,20))

AttributeError: 'tuple' object has no attribute 'plot'


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants