Skip to content

diogomatoschaves/osm-population-predictor

Repository files navigation

OSM Population Prediction Model

The associated blog post can be found here.

Introduction

This project consists of a model to predict the population of a given area, solely based on features extracted from OSM (OpenStreetMap) data. This model could have several uses in urban planning or traffic modelling for example, and as the OSM data is open source and constantly being updated, it's a free and accessible data for anyone to easily use to make estimates.

For this end a dataset was created, by taking a subset of the data gathered from reference [1], and augmenting it with more detailed OSM features with the help of osm-feature-extractor.

The included data consists of ~30k equally sized hexagons which span across the area of Great Britain (England, Wales and Scotland). The data contains information regarding the population of each area, in turn derived from Facebook's High Resolution Settlement Layer, which estimates the population from satellite imagery. Apart from the population, the data has features taken from OSM extracts, such as the number and area of buildings, the length of each type of road, the number of all kinds of shops (restaurants, groceries, etc) or the number of public transportation in the area. For a more detailed take on which features were used refer to this document.

After running the model, one can use osm-feature-extractor to generate user-defined areas on which to estimate the population on. The referenced project has instructions on how to achieve that.

The main results of the model, using a Lasso regressor are:

R2 score Mean absolute error (inhabitants / km2)
88.9% 98.8

The full results of the model are presented in the section Results below.

Quick Start

In order to run the model, do the following steps:

  1. Create a virtual environment using conda:
$ conda env create --file environment.yml
  1. Download the dataset files:
$ python download.py
  1. Run the main script that pre-processes the data, trains the model and saves it.
$ python main.py

You can adjust the project config variables in proj.conf.

input_data_file: Name of file with training data
out_file: Name of file to save model on

One can also adjust specific model parameters in settings.py.

Project Files

Libraries

The main libraries used in this application are:

Results

Population estimates vs actual Coefficients of the model
results features

References

[1] - Kontur Population: Global Population Density for 400m H3 Hexagons
[2] - Bast, Hannah, 2015. Fine-Grained Population Estimation