Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve runtime for small areas #478

Open
msbarry opened this issue Feb 6, 2023 · 8 comments
Open

Improve runtime for small areas #478

msbarry opened this issue Feb 6, 2023 · 8 comments

Comments

@msbarry
Copy link
Contributor

msbarry commented Feb 6, 2023

Planetiler takes ~30 seconds to run even for the smallest areas (like andorra from geofabrik). Let's see if there is any way to improve that. Here's a summary of runtime over andorra:

0:00:33 INF -   overall          33s cpu:1m12s gc:3s avg:2.2
0:00:33 INF -   lake_centerlines 3s cpu:12s gc:1s avg:4.4
0:00:33 INF -     read     1x(35% 0.9s done:2s)
0:00:33 INF -     process  9x(1% 0s wait:1s done:2s)
0:00:33 INF -     write    1x(0% 0s wait:1s done:2s)
0:00:33 INF -   water_polygons   12s cpu:17s avg:1.4
0:00:33 INF -     read     1x(94% 12s)
0:00:33 INF -     process  9x(0% 0s wait:12s)
0:00:33 INF -     write    1x(0% 0s wait:12s)
0:00:33 INF -   natural_earth    11s cpu:14s avg:1.3
0:00:33 INF -     read     1x(66% 7s done:4s)
0:00:33 INF -     process  9x(2% 0.2s wait:7s done:4s)
0:00:33 INF -     write    1x(0% 0s wait:8s done:4s)
0:00:33 INF -   osm_pass1        0.4s cpu:2s avg:3.7
0:00:33 INF -   osm_pass2        1s cpu:5s avg:5
0:00:33 INF -     read     1x(0% 0s)
0:00:33 INF -     process  9x(31% 0.3s)
0:00:33 INF -     write    1x(3% 0s wait:1s)
0:00:33 INF -   boundaries       0s cpu:0.1s avg:2.9
0:00:33 INF -   sort             0.1s cpu:0.7s avg:7.2
0:00:33 INF -   archive          0.5s cpu:3s avg:5.6
@msbarry msbarry added bug Something isn't working and removed bug Something isn't working labels Feb 6, 2023
@msbarry
Copy link
Contributor Author

msbarry commented Feb 6, 2023

The biggest issues are natural earth and water polygons since planetiler has to deserialize every feature for the whole planet.

One idea would be to switch natural earth to read the geopackage source, and use the built-in spatial index to limit what we read to only what's inside the bounding box.

I'm not sure if we could do something similar with water polygons since they are just a zipped shapefile with a shp and shx file but no sbn or sbx. If we convert it to a different format we could add an index, but that complicates things quite a bit since we can't just download directly from the source.

cc/ @bdon

@bdon
Copy link
Contributor

bdon commented Feb 6, 2023

At the most extreme we could define a ReadableTileArchive as another input type that is passed directly as tiled features, without touching the FeatureCollector API; the OSM or NE-derived ocean is going to be exactly the same for every planetiler output modulo tags/buffer sizes. That would make water polygons cost effectively nothing.

Otherwise we might be able to read the Shapefile index if one is included for water polygons, or migrate to another indexed format for it (Geopackage, FGB?)

@msbarry
Copy link
Contributor Author

msbarry commented Feb 6, 2023

Another hybrid option might be to compute a spatial index the first time we read a file and use it to speed up subsequent reads?

@msbarry
Copy link
Contributor Author

msbarry commented Feb 6, 2023

Or ask the maintainers of the water polygons source if they'd be up for distributing in geopackage format/adding a spatial index.

@msbarry
Copy link
Contributor Author

msbarry commented Feb 6, 2023

Another low hanging fruit would be to keep the unzipped file contents around between runs

@bdon
Copy link
Contributor

bdon commented Feb 6, 2023

Context on Geopackage etc output from osmcoastline: osmcode/osmcoastline#35 (comment)

@erik
Copy link
Contributor

erik commented Feb 17, 2023

For the Natural Earth / geopackage case, we can also have profiles declare the limited set of tables that they're interested in and skip reading features that won't be processed.

@bdon
Copy link
Contributor

bdon commented Jul 27, 2023

Proposal here #635

This addresses Natural Earth and other bring-your-own geopackages.

For water and land polygons I'd be happy to mirror those via cloud storage bucket in indexed GPKG format, or we can attempt to add that to the upstream data source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants