New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple PBF input discouraged for the time being #3925
Comments
based on my experiences : Sometimes 1. gaps.It should be checked that the two areas (pbf) are accurately connected and that there is no gap between them. For geofabrik extracts, you can check this by using the extract example Route finding through the ferry network is unlikely to be perfect here. And the geofabrik .poly files may also change as they are updated. So it doesn't hurt to check regularly! 2. Care should be taken when merging extracts of the
|
Right, you can get unlucky with Can the gaps in Geofabrik's geometries be fixed? Like is it in a repo one can PR to? It'd be good if the community could help maintain that I think. AFAIK they use the default |
the probability is small .. but not impossible. The bigger danger is not looking at the map. For example, France + Great Britain merge
maybe .. I'm used to not merging in the first place and then I don't have to compare the fit and gaps between the map data in detail. And just checking the country polygons is not enough, you also need to check the continent polygon. |
Excellent explanations with pictures here, much appreciated! I feel like these are probably the best reasons to stop supporting it, just to save others from having to discover all the possible pitfalls; basically that geofabrik is where people typically get extracts and they aren't usually considering what could go wrong when combining them. apologies for the stubborness on my part @nilsnolde @dnesbitt61 |
Hm tbh @kevinkreiser, I think you were right, we should at least try to investigate. Actually, while I agree, it’s a great summary of pitfalls, I don’t think any of that is relevant to allowing single or multiple files. If we only allow one file, we’d force people to merge. That doesn’t eliminate the pitfalls, it rather masks them even more. I think the only thing that’s important for us (or me anyways), is the de-duplication of data. I don’t know the code there, but it seems troublesome, so my first thought was „why not remove it“. And deduplication is the only thing osmium would do for us. All the other problems arising from merging separate regions (via osmium or Valhalla) are still there. So I guess my conclusion is that we have to become more sensitive to the issues @ImreSamu points out, and possibly have a paragraph in some doc about this. But it seems to me, no matter what we do, users have to be as sensitive to what they’re doing. |
@ImreSamu what would be great to have is like a detailed tutorial about merging/working with OSM extracts. It’s not very specific to routing engines, could be in general. Are you aware of smth like that? Would love to link it. |
agree ; on the other hand .. It is difficult to write a good tutorial. maybe we have to add this:
My "paranoid" tutorial
Similar problem : for the USA perfect routing - you need Canada .. |
when people use multiple pbfs we should log warn a link to this issue 😄 that might be a pretty good way to raise awareness of the hell that is splicing osm together |
True that. A bit more formatting/cleanup and it’s kind of a tutorial:) thanks @ImreSamu |
So apart from the general issues with combining extracts as mentioned above there is also the technical issues involving duplicated data from multiple pbfs. as mentioned in #3908 the code is currently not robust to duplication which normaly just ends up in adding extra edges to the graph which isnt in itself the end of the world (nice pun). but we do limit the total number of edges that can originate from a node and this ends up in erroring out on some more extreme cases once they are duplicated by overlapping data. the trick to being able to deduplicate the edges is being able to recognize when two sequences of waynodes in the waynodes file are the same edge. the problem with that is that we only know which way they are by their way_index which is simply an index into the way attribute data and that is not sorted (i dont think) by wayid so duplicates happen over there. further annoying is that the waynodes who are duplicates arent near each other then in their container. so we need a to tie duplicates to each other in a simple way. we cant use a map or whatever because its too huge so we need to somehow recognize duplicates based on sorting. so how can we do that.. after this bit of code: valhalla/src/mjolnir/pbfgraphparser.cc Line 3231 in 138719d
we have the waynodes sorted by osm node id. which means we will have duplicates of the same node next to each other in the list. the other thing we have is the way_index so we can tell that they are duplicates because they dont have the same way index. if they did have the same way index then they are still valid even if dups because you can have ways coming back on themselves. anyway. if we see adjacent waynodes with node matching way_index then we can simply remove them from the way nodes (probably just mark them as ignored not remove becuase that is expensive). then when we go to run over the full set of them to build the graph we just skip them and no dups show up. |
There has likely been a bug with ingesting multiple PBFs and until we had the chance to look into it in detail, we'd encourage people to please merge PBFs until it's addressed, ref #3908 (comment).
It's easy and fast using
osmium
:osmium merge PBF1 PBF2 PBF3 -o merged.pbf
The text was updated successfully, but these errors were encountered: