-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensuring all nodes are handled before ways #1
Comments
Hi, thanks for opening an issue! I really can't know when parsing nodes is completed as there is no guarantee that OSM PBFs are in the correct order. Single-threading the processing could ensure that all nodes will be processed first, as long as the input file is sorted. The only strategy in which you can be completely sure that all nodes have been processed is reading the file once, only parsing nodes and then reparse the file and ignoring the nodes. |
One question: Have you encountered this problem with any files? If so, I would like to investigate if you can provide me with an example file. BTW, in my own applications I process without any additional safeguards (https://github.com/thomersch/grandine/blob/master/cmd/spatialize/spatialize.go#L44) and haven't had any issues so far. |
Yes, unfortunately, I have run it to it a few times, I have a library based off this parser https://github.com/missinglink/pbf which has a bunch of different commands available. In that repo I also link to my fork of the parser where I added some extra features such as a PBF indexer which can be used for random file access on the PBF file, it's pretty neat but not really fast eough for production use. I also added another feature called 'breakpoints' which was my attempt at being able to know when the nodes are complete so I can start on the ways, it's a difficult thing to write because of how it's not possible to know the contents of a block until after it's been decompressed (which is done in parallel). I also noticed that some blocks contains nodes ways and relations (more common on geofabrik extracts) while some files have blocks which only ever contain one type per block (as per the ex-mapzen metro extracts). Doing multiple passes on the file is a good workaround but it's not very convenient for the planet file, so I was hoping to find a solution which would allow me to make I'll write up an example and post it below. |
something like this: package main
import (
"flag"
"fmt"
"log"
"os"
"sync"
"github.com/thomersch/gosmparse"
)
type handler struct {
nodes map[int64]gosmparse.Node
mutex *sync.Mutex
}
func (d *handler) ReadNode(n gosmparse.Node) {
d.mutex.Lock()
d.nodes[n.ID] = n
d.mutex.Unlock()
}
func (d *handler) ReadWay(w gosmparse.Way) {
for _, ref := range w.NodeIDs {
if _, ok := d.nodes[ref]; !ok {
fmt.Println("could not find node", ref)
}
}
}
func (d *handler) ReadRelation(r gosmparse.Relation) {
/* no-op */
}
func main() {
source := flag.String("in", "osm.pbf", "")
flag.Parse()
f, err := os.Open(*source)
if err != nil {
log.Fatal(err)
}
dec := gosmparse.NewDecoder(f)
dh := handler{
nodes: make(map[int64]gosmparse.Node),
mutex: &sync.Mutex{},
}
err = dec.Parse(&dh)
if err != nil {
log.Fatal(err)
}
} $ go run example.go --in /media/flash/berlin.osm.pbf
fatal error: concurrent map read and map write
goroutine 20 [running]:
runtime.throw(0x544d78, 0x21)
/usr/local/go/src/runtime/panic.go:619 +0x81 fp=0xc42502bda8 sp=0xc42502bd88 pc=0x428291
runtime.mapaccess2_fast64(0x519220, 0xc420082450, 0x7632e, 0xc42502be68, 0x92546cce)
.... if I fix the map access error with a mutex then the map may or may not contain all the nodes I need, depending on the size of the extract (smaller extracts are more prone to this) |
I really like this library and I changed to using it over another one, unfortunately having to do multiple passes on the file negates the speed benefits of this library vs others. Do you have any ideas how I might be able to add an option which prevents the ways/rels to be processed until all their dependents are finished? |
Sorry, I totally forgot this issue existed. Unfortunately it is kinda hard to resolve this issue, because of the before mentioned lack of guarantees. Collecting the blocks from a file is single-threaded in gosmparse, but processing is dependent on |
Heya,
Is there any mechanism that would allow me to ensure that all calls to
ReadNode
have been completed before the first timeReadWay
is called?I would like to denormalize ways, so I need to ensure that all the nodes are in memory before processing the ways.
The text was updated successfully, but these errors were encountered: