Topology Learning

A machine learning project for geospatial vector geometries. This project explores the performance (measured in accuracy) of deep learning models versus shallow learning algorithms on a set of tasks involving mostly polygon geo vector shapes. The shallow models use engineered features of signatures from shapes mostly from signal processing. The deep models use the (normalized) geometries directly by float-encoding. The library that does the vector encoding for the deep learning also contains a method for decoding geometries if you are into sequence-to-sequence models.

The repository contains the necessary code and data for three different experiments:

predict the number of inhabitants from a neighborhood geometry
predict the building type from a building geometry
predict the archaeological feature type from a feature geometry

Data

TLDR; All data necessary to run the scripts on neighborhoods, building types, and archaeological features is enclosed in the repository itself.

To run the training in the repository, install the dependencies using pip3 install -r requirements.txt from the repo root.

If you want to re-create the data, run:

docker-compose up

from the repository root, this requires that you install docker engine and docker compose. It will prepare all data you need to run the models.

Neighborhoods data set

The data for the neighborhoods 2017 is from the Dutch Central Bureau for Statistics (CBS)and downloaded from a Web Feature Service distributed by PDOK. Catalog info on the data set is here. The ETL script is at prep/get-neighborhoods.py. The data is all open. The script runs for a minute or so, and will produce both a training set and a test set, with input geometries (for the deep neural net configuration), the number of inhabitants for the neighborhood as labels and other feature-engineered properties to make comparisons with non-deep learning algorithms.

Buildings data set

The data for the buildings experiment is provided by the Dutch national Cadastre, also distributed by a PDOK Web Feature Service. It uses a pre-configured spatial data set that already links functions to building, functions that are normally connected to functional sections within buildings - a building can have multiple functions of course. The catalog info on the data set is here.

Archaeological features data set

The archaeological features data set is tailored to solve the task of predicting the feature type (classification) based on geometries alone. The data is all collected and owned by ADC ArcheoProjecten and deposited at Data Archiving and Networked Services (DANS) in the EASY system:

Project	ID	No. of features	has definitions
ENKN_09	https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:26721	11058	Yes
VENO13_08	https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:26804	5101	Yes (joined)
MONF_09	https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:28372	5603	Yes (joined)
VEEE_07	https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:29574	5243	Yes
GOUA_08	https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:34403	5306	Yes
VENO_02	https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:37522	5207	Yes
KATK_08	https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:44368	3165	Yes (joined)
WIJD_07	https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:49763	12131	Yes (joined)
OOST_10	https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:51929	17251	Yes (joined)
VEGL_10	https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:53545	4271	Yes (joined)

TOTAL		74336

Base Registration for Topography (BRT) and OpenStreetMap (OSM) data

Note first that there are pre-built numpy archive files under files, so you don't need to rebuild the training and test data. If you want to prepare the data yourself, or want to derive a different pipeline from it, I suggest you go for the dockerized version (if you value your time). The dockerized version uses a PostGIS database instance to implement an ETL process that does the heavy lifting. Afterwards it's mostly a question of converting to normalized numpy vectors that can be understood by machine learning frameworks and saving the data to numpy archives.

Numpy archive description

The numpy archive geodata-vectorized.npz under files contains vectors deserialized from well-known text geometries, using the shapely library. They are re-serialized as a 3D tensor as a combination of real-valued and one-hot components:

Property	Description
input_geoms	A nested array of shape (?, ?, 16) - depending on the sequence length settings - of deserialized WKT polygons with the following encoding: `[:, :, 0:5]`: The polygon point/node coordinates in lon/lat, `[:, :, 5:13]`: The geometry type, a one-hot encoded sequence of set {GeometryCollection, Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, Geometry}, `[:, :, 13:]`: A one-hot encoded sequence of actions in set {render, stop, full stop}
intersection	Geometries representing the intersection in WGS84 lon/lat, same geometry encoding as `input_geoms`
centroid_distance	Distance as gaussian with parameters mean and sigma shape `[:, :, 2]` in meters between the centroids
geom_distance	Distance as gaussian with mean and sigma shape `[:, :, 2]` in meters between the geometries, distance 0 if intersecting
brt_centroid	Centroid point in WGS84 lon/lat of the BRT geometry
osm_centroid	Centroid point in WGS84 lon/lat of the OSM geometry
centroids	Two centroid points for BRT and OSM in WGS84 lon/lat
centroids_rd	Two centroid points for BRT and OSM in Netherlands RD meters

Docker

Due to the cross-platform dependency issues installing python geospatial packages, the setup has been Dockerized to ensure cross-platform interoperability. This is the easiest way to use.

See instructions on installing and running Docker on your platform.
Install docker compose
Run docker-compose build in the root folder of this repository
!! Make sure ./script/get-data.sh has LF line endings if you run this on windows!!. To be sure, you can install Git for windows, open a git bash shell in the repo folder and issue a dos2unix ./script/*.sh. This will convert the line endings to unix line feeds.
Run docker-compose up. Once the container has exited, you should see a files directory in your repo folder with a file topology-training.csv. This contains your extracted geometries, distances and intersections.
You can re-create the topology-training csv using docker-compose run data-prep export-data.sh

Using self-installed libraries

If you need to run or debug locally, you need the following dependencies:

Package	Linux	Mac	Windows
Shapely	`apt-get install -y libgeos-dev && pip install shapely`		download and `pip install` one of these wheels
GDAL	`apt-get install -y libproj-dev && git clone https://github.com/OSGeo/gdal.git && cd gdal/gdal && ./configure --with-python && make >/dev/null 2>&1 && make install && ldconfig`		download and `pip install` one of these wheels
PostGIS	`apt-get install postgres postgis`	`brew install -y postgres postgis`	see here
slackclient	`pip3 install slackclient`	`pip3 install slackclient`	`pip3 install slackclient`
Once you have everything installed, cd to the prep directory and execute `get-data.sh` en `export-data.sh`.

Conversion to numpy archive

After you have created your topology-training.csv, convert it to a numpy archive using cd model && python3 vectorize.py. Note there is a pre-built geodata-vectorized.npz under files

Running the models

Model scripts are run from the model directory. Each one of them is a more or less elaborate strategy to work with real-valued data, often combined with categorical data, using LSTMs.

You can monitor progress using tensorboard: tensorboard --log_dir ./model/tensorboard or python3 -m tensorflow.tensorboard --logdir ./model/tensorboard

Gaussian verification experiments

Many models include gaussian parameter outputs as part of the task. The job of these models is to verify the accuracy and stability of univariate and bivariate gaussian loss functions.

`fixed_univariate_gaussian.py`

The most simple of gaussian approximations. Attempts to approximate a mu of 52. as close as possible with a sigma of 0. Converges in about 10 epochs to exactly [52. 0.].

`fixed_bivariate_gaussian.py`

Attempts to approximate a bivariate gaussian with mu's of a realistic coordinate set of [5, 52], sigmas of 0 and rho of 0. Converges in about 30 epochs, but weirdly brings rho towards +/- 9e0 instead of 0.

`random_univariate_gaussian.py`

Trains to approximate a randomly instantiated set of integers between 1 and 20.

Distance models

`centroid_distance.py`

Calculate the distance between the centroids (of two polygons) in meters, expressed as a single gaussian. Converges to deviations of centimeters in about 50 epochs. This is a helpful demo model, since centroidal distances can in general practice be used between any set of two (multi)geometries, whether this are (multi)points, (multi)polygons or other.

`geom_distance.py`

Approximates the distance between two polygons in meters, expressed as a single gaussian. This is an approximation of a vanilla distance query such as PostGIS:ST_distance. If the polygons intersect, the distance is 0. Converges in about 25 epochs to centimeter-level precision. Intersecting geometries do not converge to exactly 0, but to levels within centimeters deviation.

Intersection sequence-to-sequence auto-encoder

`intersection.py`

Approximates the intersection of two geometries into a new geometry. Work in progress.

`intersection-concat.py`

Approximates the intersection of two concatenated geometries into a new geometry, a variation of the previous setup. Work in progress.

Character-level model

There is one model that uses a character-level sequence-to-sequence strategy to approximate intersections. Work in progress.

Name		Name	Last commit message	Last commit date
Latest commit History 747 Commits
build_server		build_server
files		files
model		model
prep		prep
script		script
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
efd_orders.svg		efd_orders.svg
efd_orders.svg.png		efd_orders.svg.png
requirements.txt		requirements.txt
setup.py		setup.py

License

SPINLab/geometry-learning

Folders and files

Latest commit

History

Repository files navigation

Topology Learning

Data

Neighborhoods data set

Buildings data set

Archaeological features data set

Base Registration for Topography (BRT) and OpenStreetMap (OSM) data

Numpy archive description

Docker

Using self-installed libraries

Conversion to numpy archive

Running the models

Gaussian verification experiments

fixed_univariate_gaussian.py

fixed_bivariate_gaussian.py

random_univariate_gaussian.py

Distance models

centroid_distance.py

geom_distance.py

Intersection sequence-to-sequence auto-encoder

intersection.py

intersection-concat.py

Character-level model

About

Topics

Resources

License

Stars

Watchers

Forks

Languages