OSM Population Prediction Model

The associated blog post can be found here.

Introduction

This project consists of a model to predict the population of a given area, solely based on features extracted from OSM (OpenStreetMap) data. This model could have several uses in urban planning or traffic modelling for example, and as the OSM data is open source and constantly being updated, it's a free and accessible data for anyone to easily use to make estimates.

For this end a dataset was created, by taking a subset of the data gathered from reference [1], and augmenting it with more detailed OSM features with the help of osm-feature-extractor.

The included data consists of ~30k equally sized hexagons which span across the area of Great Britain (England, Wales and Scotland). The data contains information regarding the population of each area, in turn derived from Facebook's High Resolution Settlement Layer, which estimates the population from satellite imagery. Apart from the population, the data has features taken from OSM extracts, such as the number and area of buildings, the length of each type of road, the number of all kinds of shops (restaurants, groceries, etc) or the number of public transportation in the area. For a more detailed take on which features were used refer to this document.

After running the model, one can use osm-feature-extractor to generate user-defined areas on which to estimate the population on. The referenced project has instructions on how to achieve that.

The main results of the model, using a Lasso regressor are:

R2 score	Mean absolute error (inhabitants / km2)
88.9%	98.8

The full results of the model are presented in the section Results below.

Quick Start

In order to run the model, do the following steps:

Create a virtual environment using conda:

$ conda env create --file environment.yml

Download the dataset files:

$ python download.py

Run the main script that pre-processes the data, trains the model and saves it.

$ python main.py

You can adjust the project config variables in proj.conf.

input_data_file: Name of file with training data
out_file: Name of file to save model on

One can also adjust specific model parameters in settings.py.

Project Files

settings.py - file with configuration parameters
basic_features.py - notebook with workflow with basic OSM features
all_features.py - notebook with workflow with extended OSM features
main.py - main python script that wraps all pipeline steps
process_data.py - processes the data before being fed to the model
train_model.py - contains the logic where the data is fitted into the model
model_evaluation.py - contains the logic for evaluating and showing the results of the model
pipeline_classes.py - contains classes that are used in the machine learning pipeline
helper_methods.py - contains helper methods used in the pipeline
hexagons_basic_features_sample.geojson - dataset of hexagons with basic features (sample data)
hexagons_all_features_sample.geojson - dataset of hexagons with extended features (sample data)

Libraries

The main libraries used in this application are:

Results

Population estimates vs actual	Coefficients of the model

References

[1] - Kontur Population: Global Population Density for 400m H3 Hexagons
[2] - Bast, Hannah, 2015. Fine-Grained Population Estimation

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
data/population_tests		data/population_tests
model		model
notebooks		notebooks
preprocessing		preprocessing
utils		utils
.gitignore		.gitignore
README.md		README.md
download.py		download.py
environment.yml		environment.yml
main.py		main.py
proj.conf		proj.conf
requirements.txt		requirements.txt
settings.py		settings.py

diogomatoschaves/osm-population-predictor

Folders and files

Latest commit

History

Repository files navigation

OSM Population Prediction Model

Introduction

Quick Start

Project Files

Libraries

Results

References

About

Topics

Resources

Stars

Watchers

Forks

Languages