Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support PBF to Atlas creation irrespective of PBF size #147

Open
MikeGost opened this issue Jun 22, 2018 · 4 comments
Open

Support PBF to Atlas creation irrespective of PBF size #147

MikeGost opened this issue Jun 22, 2018 · 4 comments

Comments

@MikeGost
Copy link
Contributor

MikeGost commented Jun 22, 2018

The current OSM PBF to Atlas flow is optimized for sharded PBF files. The process is very memory intensive and results in OutOfMemoryError exceptions for large PBF files. There needs to be a way to support any type of PBF files, irrespective of size. Here is one possible option:

  • Given a PBF file location and a sharding tree, shard the PBF file, produce an Atlas for each shard and produce either sharded Atlas output or as a single Atlas file (multi-atlas the sharded atlases and clone into a PackedAtlas)

  • If no sharding tree is provided, fall back to a slippy tile zoom level and flat sharding case, then follow the same output strategy as outlined above.

This is loosely related to issue #88. An example of a reported use-case can be found here.

@MikeGost MikeGost changed the title Support PBF to Atlas translation irrespective of PBF size Support PBF to Atlas creation irrespective of PBF size Jun 22, 2018
@flowrean
Copy link

Can you give a small example of how you would shard an OSM PBF file?
Is there any documentation detailing working with shards in Atlas? I can only find this README about sharding, but it does not include a code example.

@flowrean
Copy link

Trying to follow your outline above, I have used osmosis to shard a larger OSM PBF file, using the completeWays option. But when I multi-atlas the sharded atlases and clone into a PackedAtlas, this is again very memory intensive and takes a very long time.
Is there any way around this or am I doing something wrong?

@matthieun
Copy link
Collaborator

Hello @flowrean!

Both osm pbf and Atlas are not designed to handle large amounts of data in one single place. When developing locally, I try to allocate 10Gb of memory to my processes, and that can only handle a handful of Atlas shards that are ~20Mb each on disk (zipped). Once in memory, those are much bigger, which is the tradeoff we chose to get very fast processing, even on complex problems.

To process larger datasets, there is an option which uses Spark: https://github.com/osmlab/atlas-generator. However that requires you to have access to a spark cluster, and some kind of distributed storage where to put all the sharded pbf files to process. In the end, this will distribute the processing of each shard, to produce one atlas per shard, but it will still not generate a single large atlas for you.

One other option, if you only care about a specific type of data, is to take each shard individually and serially filter them down. Once this is complete, and the filtering is aggressive enough, you might be able to do a massive multi-atlas with the slimmed down shards in a reasonable amount of memory. See this StackOverflow question.

@flowrean
Copy link

Thank you for your clarification @matthieun, it is most valuable to me.

I let go of the intention to create a single large Atlas file, but I would still like to process a large OSM PBF. This I would do in shards (created by Osmosis with the completeWays option) as suggested above. But I run into a new problem: a way that crosses a shard boundary is processed more than once. The geometry of the result can also be different (number of times the line is split), if there are incoming or outgoing ways that do not appear in all shards. This also messes up the edge IDs.

Have you encountered this too or do you have any idea how to avoid this situation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants