Skip to content

hauke96/osm-changeset-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OSM changeset analyser

A tool analysing the changesets from OpenStreetMap (OSM).

Compilation

This uses sigolo (logging) and kingpin (CLI options) as dependencies. Everything can be compiled normally.

go get https://github.com/hauke96/sigolo
go get https://github.com/hauke96/kingpin
go run .

Usage

Here a short version of the --help flag:

usage: OSM changeset analyser --analysers=ANALYSERS [<flags>] <file>

A tool analysing the changesets from OpenStreetMap (OSM).

Flags:
  -h, --help                 Show context-sensitive help (also try --help-long and --help-man).
  -d, --debug                Verbose mode, showing additional debug information
      --analysers=ANALYSERS  A comma separated list of analysers
  -v, --version              Show application version.

Args:
  <file>  The file to analyse

ANALYSERS:
  The 'analysers' flag is a comma separated list of analysers all creating their own CSV file:

  * editor-count : Counts the amount of the most common editors for each month.
  * no-source-count : Counts the amount of monthly changesets without source tag, sorted by editor.
  * user-without-source : Counts for each user the amount of changesets without source tag for each editor editor.
  * comment-keywords(foo,bar) : Takes keywords (in this case "foo" and "bar") and counts their occurrence per month. Comments and keywords are converted into lower case.

So for example this call analyses the data.osm using the three analysers for the editor count, the editor without source and the users without source:

$> go build .
$> ./osm-changeset-analyser --analysers=editor-count,no-source-count,user-without-source data.osm
$> ll result*
-rw-r--r-- 1 hauke hauke 8,2K  7. Mär 15:03 result_editor-count.csv
-rw-r--r-- 1 hauke hauke 8,2K  7. Mär 15:03 result_no-source-count.csv
-rw-r--r-- 1 hauke hauke  529  7. Mär 15:03 result_user-without-source.csv

Input data and format

OSM changesets have a simple XML structure. Each changeset has basic metadata (user, location, creation date, etc.) and more specific metadata (comment, source of data, etc.), which can consist of arbitrary XML tags.

<changeset id="1234567"
		created_at="2020-01-12T14:03:44Z"
		open="false"
		comments_count="2"
		changes_count="154"
		closed_at="020-01-12T14:04:15Z"
		min_lat="10.24"
		min_lon="20.48"
		max_lat="5.12"
		max_lon="2.56"
		uid="12345"
		user="mega-mapper-3000">
	<tag k="source" v="survey; Bing"/>
	<tag k="hashtags" v="#github;#example"/>
	<tag k="created_by" v="JOSM/1.5 (15492 en)"/>
	<tag k="comment" v="Useful information for other mappers"/>
</changeset>

The latest data for the whole planet can be downloaded from https://planet.openstreetmap.org/planet/changesets-latest.osm.bz2. This is over 3GB large (decompressed approx. 34GB) and contains all changesets from 2005 til now.

Performance

I tested the performance on my private computer (s. below). Of course there were some other applications running (like E-Mail client, Browser, Editors, etc.) but I wasn't doing anything during the execution.

Dataset

I used the changesets-200224.osm.bz2 (donwload size: 3.2GB / decompressed size: 34GB).

My system:

  • CPU: Intel Xeon E3-1231 v3, 8x3.4GHz
  • RAM: 16GB DDR3 1333MHz
  • Drive: Samsung SSD 850 EVO

Measurements

Here are some example executions:

active analysers execution time processing speed RAM usage (approx.)
no-editor 6m, 39s 85 MB/s 6.8 GB
user-without-source 7m, 12s 78 MB/s approx. 10 GB
no-editor
no-source-count
user-without-source
7m, 21s 77 MB/s 10GB

Output files

13K result_editor-count.csv
13K result_no-source-count.csv
52M result_user-without-source.csv

For developers

There exist multiple goroutines processing the data asynchronously. See the doc folder for more information.

About

Tool to analyse OpenStreetMap changesets written in go

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages