Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid downloading neighboring geometries #110

Open
shishkin opened this issue May 13, 2024 · 3 comments
Open

Avoid downloading neighboring geometries #110

shishkin opened this issue May 13, 2024 · 3 comments

Comments

@shishkin
Copy link

When specifying Monaco to get a geometry and then use that geometry to download and convert OSM into parquet, quackosm downloads 346 MB of files/Geofabrik_provence-alpes-cote-d-azur.osm.pbf instead of 527 KB of the actual Monaco PBF.

I also tried the same with Regierungsbezirk Düsseldorf. Quackosm downloads neighboring Münster and Köln. That is almost 325 MB more than just 190 MB asked.

When downloading Germany, Quackosm also downloads Denmark, Austria, and Czechia.

Is there a way to avoid downloading unneeded OSM files?

@RaczeQ
Copy link
Collaborator

RaczeQ commented May 16, 2024

Hello @shishkin
For now, this is the expected behaviour, because QuackOSM tries to cover given geometry fully and extracts geometries not always line-up perfectly with geocoded ones.

Monaco

I've plotted the geocoded geometry for clause Monaco (https://www.openstreetmap.org/relation/1124039) in yellow, and Geofabrik extract geometry for Monaco (http://download.geofabrik.de/europe/monaco.html) in red.
image

As you can see, there is a huge chunk of sea area that is returned by Nominatim, that isn't covered by extract from Geofabrik.

But, changing the PBF source from geofabrik to osmfr will solve the issue for your use case:
image

import quackosm as qosm
import osmnx as ox

qosm.convert_geometry_to_geodataframe(
    geometry_filter=ox.geocode_to_gdf("Monaco").unary_union, osm_extract_source="osmfr"
)
quackosm --geom-filter-geocode Monaco --osm-extract-source osmfr

Düsseldorf

image
Here switching to osmfr source can also help.

Germany

image
Again osmfr source can also help.

Summary

By default QuackOSM uses only Geofabrik extracts, because scraping BBBike and OSMfr takes a long time to do, but these services could contain better matching geometries for particular use cases. Also, Geofabrik has better coverage of the whole world than OpenStreetMap.fr, but they don't have enough buffer around extracts to fully cover Nominatim-based geometries.

Looking at those examples, I think I can fix the issue regarding Germany and Düsseldorf case for Geofabrik default source, by discarding new extracts if their contribution to overall geometry is insignificant (for example less than 1% of the queried geometry).

OSM_fr index - better precision in particular areas, but some gaps outside Europe
image

Geofabrik index - more uniform coverage
image

@shishkin
Copy link
Author

I see. I'm actually confused by what you call "Nominatim-based geometries". Aren't all geometries coming from OSM unchanged, where Nomimatim is a search index and Geofabrik, osmfr and others are just repackaging the same OSM world.pbf in smaller pieces? I get that the nature of boundaries is very complicated, but so far Geofabrik slicing seem quite practical. I would actually even prefer to just specify names of Geofabrik extracts directly (like duesseldorf-regbez) in order to reuse quackosm's caching.

@RaczeQ
Copy link
Collaborator

RaczeQ commented May 17, 2024

Nominatim can be a source of truth, but all of those services can define their geometries and names.
BBBike for example serves rectangular extracts around cities detached from administrative geometries.

I've added two issues to tackle the problems mentioned here:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants