Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A post and 2 tutorials on single-file open geospatial standards #150

Merged
merged 31 commits into from
Feb 5, 2020
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
9c850a9
Add RStudio proj to gitignore (was untracked)
florisvdh Nov 20, 2019
1094b61
Post + 2 tutorials on open geospatial standards
florisvdh Nov 20, 2019
40f50ae
Open geospatial standards: minor updates & additions
florisvdh Nov 21, 2019
d8d20f6
Open geospatial standards: add small note
florisvdh Nov 21, 2019
21077d1
Open geospatial standards: typos / language
florisvdh Nov 22, 2019
a88f088
Open geospatial standards: add DOI note
florisvdh Nov 22, 2019
d5aa961
Open geosp strds: demo of multiple vec layers in GPKG
florisvdh Nov 22, 2019
36dcb8b
Open geosp strds: update md files
florisvdh Nov 22, 2019
027fd6c
Open geosp strds: omit faulty footnote on rasters in gpkg
florisvdh Nov 22, 2019
ea4205e
Open geosp strds: minor language fixes
florisvdh Nov 22, 2019
0aaeb1e
Open geosp strds: small clarifications
florisvdh Nov 28, 2019
6611fdf
Open geosp strds: code improvement
florisvdh Nov 28, 2019
8807d9d
Merge branch 'master' into geoformats
florisvdh Jan 31, 2020
de81b86
Open geosp strds: rephrase longevity
florisvdh Jan 31, 2020
19bb8e7
Open geosp strds: note on multiple geometry types in GeoJSON
florisvdh Jan 31, 2020
69e9cda
Open geosp strds: be more specific about use of raster::calc()
florisvdh Jan 31, 2020
a0fd2e8
Open geosp strds: note on GeoJSON versioning
florisvdh Jan 31, 2020
6441c80
Open geosp strds: demonstrate use of GPKG for rasters *
florisvdh Feb 3, 2020
ffc4b72
Open geosp strds: introduce stars for single-raster GPKG
florisvdh Feb 3, 2020
aa575f0
Open geosp strds: note on ESRI personal geodatabase
florisvdh Feb 3, 2020
e802ea5
Open geosp strds: fix title raster tutorial
florisvdh Feb 3, 2020
890e8a8
Open geosp strds: add vector/raster formats tables
florisvdh Feb 3, 2020
ca09d2a
Open geosp strds: minor enhancements/fixes
florisvdh Feb 3, 2020
dd9d140
Open geosp strds: YAML updates
florisvdh Feb 3, 2020
3bda2f9
Open geosp strds: update index.md files
florisvdh Feb 3, 2020
e1936d4
Extend token pattern in .gitignore
florisvdh Feb 3, 2020
72652ab
Open geosp strds: address comment of @carinewils
florisvdh Feb 4, 2020
4a3d6bc
Open geosp strds: update reference Lovelace et al.
florisvdh Feb 5, 2020
bd117cf
Open geosp strds: various minor fixes/improvements
florisvdh Feb 5, 2020
46b2a2d
Open geosp strds: updates comparison tables
florisvdh Feb 5, 2020
a467226
Open geosp strds: update md files
florisvdh Feb 5, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
# RStudio files
.Rproj.user/
.Rproj.user
*.Rproj

# produced vignettes
vignettes/*.html
Expand Down
90 changes: 90 additions & 0 deletions content/articles/geospatial_standards/index.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
florisvdh marked this conversation as resolved.
Show resolved Hide resolved
title: "Meet some popular open geospatial standards!"
description: "A short introduction to the powerful GeoPackage, GeoJSON and GeoTIFF standards"
author: "Floris Vanderhaeghe"
date: 2019-11-19
csl: ../inbo.csl
bibliography: ../reproducible_research.bib
categories: ["r", "gis"]
tags: ["gis", "r", "open science"]
output:
md_document:
preserve_yaml: true
variant: gfm
---

Some inspiration for this post came from the beautiful books of @lovelace_geocomputation_2019, @pebesma_edzer_spatial_2019 and @heijmans_spatial_2019, and from various websites.

## Why use open standards?

- Open file standards ease collaboration, portability and compatibility between users, machines and applications.
- Their (file) structure is fully documented.
- Consequently, scientists and programmers can build new software / packages and make innovations that use these standards, while maintaining interoperability with existing applications.
- And, it guarantees that your data will still be readable in a hundred years from now, independently of which IT corporation or software is dominant at that time...
florisvdh marked this conversation as resolved.
Show resolved Hide resolved

Luckily, quite a list of open standards is available!
Below, some powerful and widely-used single-file formats are introduced.
florisvdh marked this conversation as resolved.
Show resolved Hide resolved
Single-file data sources are readily amenable to exchange and publication.

I see you can't wait to start practicing, so you can also head straight over to the [tutorial on vector formats](../../tutorials/spatial_standards_vector/) and the [tutorial on the GeoTIFF raster format](../../tutorials/spatial_standards_raster/)!

## A few words on the GDAL library

**[GDAL](https://gdal.org)** (Geospatial Data Abstraction Library) is by far the most used collection of open-source drivers for:

- [a lot](https://gdal.org/drivers/raster/index.html) of raster formats;
florisvdh marked this conversation as resolved.
Show resolved Hide resolved
- [a lot](https://gdal.org/drivers/vector/index.html) of vector formats.
florisvdh marked this conversation as resolved.
Show resolved Hide resolved

In other words, it is the preferred workhorse for reading and writing many geospatial file formats, used in the background by [a lot](https://gdal.org/software_using_gdal.html#software-using-gdal) of geospatial applications.
florisvdh marked this conversation as resolved.
Show resolved Hide resolved
Using GDAL is the easiest way to conform to open standards.

So, in R we use packages that use GDAL in the background, such as `rgdal`, `sp`, `sf`, `raster` and `stars`.

## The GeoPackage file format
florisvdh marked this conversation as resolved.
Show resolved Hide resolved

- Its website is <https://www.geopackage.org>.
- It is a standardized implementation of an SQLite database for geospatial data.
Hence, a GeoPackage is a **binary** file (`filename.gpkg`).
It shares this property with shapefiles, which however pose multiple limitations ^[
florisvdh marked this conversation as resolved.
Show resolved Hide resolved
Some problems with shapefiles are: they're not an open format, they consist of multiple files and they have restrictions regarding file size, column name length, number of columns and the feature types that can be accommodated.
],
so the GeoPackage is a more than suitable replacement.
- The GeoPackage can store one or _multiple_ **vector** layers (points, lines, polygons and related feature types).
Besides vector data, it can also store **raster** data
or extra standalone **tables**.
- The GeoPackage standard is [maintained](https://www.opengeospatial.org/standards/geopackage) by the [Open Geospatial Consortium](https://www.opengeospatial.org/) (OGC), which stands out as a reference when it comes to open geospatial standards.

## The GeoJSON file format
florisvdh marked this conversation as resolved.
Show resolved Hide resolved

- One [GeoJSON](https://tools.ietf.org/html/rfc7946) file (`filename.geojson`) contains _one_ **vector** layer.
florisvdh marked this conversation as resolved.
Show resolved Hide resolved
[JSON](https://en.wikipedia.org/wiki/JSON) itself is a common and straightforward open data format.
It is a **text** file readable both by humans and machines (see the [tutorial](../../tutorials/spatial_standards_vector/) for an example).
GeoJSON adds the necessary specification to JSON for standardized storage of geographic feature data, but it is still a plain JSON text file.
hansvancalster marked this conversation as resolved.
Show resolved Hide resolved
- The GeoJSON standard is maintained by the Internet Engineering Task Force ([IETF](https://www.ietf.org/)), a large open standards organization that develops Internet standards under the auspices of the Internet Society.
- Although the previous version of the GeoJSON standard -- GeoJSON 2008 -- is still a lot in use, it is [obsoleted](http://geojson.org/geojson-spec.html) and a new version **[RFC7946](https://tools.ietf.org/html/rfc7946)** is establishing.
- This version is strict about the coordinate reference system (CRS) -- it is always [WGS84](https://epsg.io/4326) -- and it also differs on a few other aspects (such as the recommendation for applications [not to inflate](https://tools.ietf.org/html/rfc7946#section-11.2) decimal coordinate precision).
- RFC7946 solves the problem that quite a few libraries -- including GDAL -- simply assumed WGS84 in GeoJSON 2008 (without checking or transforming), even though WGS84 was not a requirement of GeoJSON 2008 (it did support an explicit _crs_ declaration).
This resulted in inconveniences (cf. [this post](https://github.com/r-spatial/sf/issues/344#issue-229118527) in the `sf`-package).
- A [specific section](https://gdal.org/drivers/vector/geojson.html#rfc-7946-write-support) in the documentation of GDAL's GeoJSON driver gives a summary of the differences between both GeoJSON versions.
- While GDAL by default still follows the GeoJSON 2008 format ^[
Though GeoJSON 2008 is obsoleted, the now recommended RFC7946 standard is still officially in a _proposal_ stage.
That is probably the reason why GDAL does not yet default to RFC7946.
A somehow confusing stage, it seems.
],
RFC7946 is supported by the option `RFC7946=YES`.
Here, on-the-fly reprojection to WGS84 will happen automatically.
It applies 7 decimal places for coordinates, i.e. approximately 1 cm.
Given the advantages, _**we advise to explicitly use RFC7946**_.
Several functions in R allow the user to provide options that are passed to GDAL, so we can ask to deliver RFC7946 (see the [tutorial](../../tutorials/spatial_standards_vector/)).
- In order to keep it manageable (text file size, usage in versioning systems) it can be wise to use GeoJSON for more simple cases (points and rather simple lines and polygons), and use the binary GeoPackage format for larger (more complex) cases.
florisvdh marked this conversation as resolved.
Show resolved Hide resolved

## The GeoTIFF file format

- [GeoTIFF](https://en.wikipedia.org/wiki/GeoTIFF) is the preferred single-file open standard for **raster** data.
It adheres to the open [TIFF](https://en.wikipedia.org/wiki/TIFF) specification; hence it is a TIFF image file (`filename.tif`).
It [uses](http://docs.opengeospatial.org/is/19-008r4/19-008r4.html#_geotiff_file_structure_and_geotiff_crs_and_models_principles_informative) a small set of reserved TIFF tags to store information about CRS, extent and resolution of the raster.
- A GeoTIFF file can contain _one_ or _multiple_ rasters with the same CRS, extent and resolution.
- The GeoTIFF standard is [maintained](https://www.opengeospatial.org/standards/geotiff) by the [Open Geospatial Consortium](https://www.opengeospatial.org/) (OGC), which stands out as a reference when it comes to open geospatial standards.
hansvancalster marked this conversation as resolved.
Show resolved Hide resolved


## Literature
179 changes: 179 additions & 0 deletions content/articles/geospatial_standards/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
---
title: "Meet some popular open geospatial standards!"
description: "A short introduction to the powerful GeoPackage, GeoJSON and GeoTIFF standards"
author: "Floris Vanderhaeghe"
date: 2019-11-19
csl: ../inbo.csl
bibliography: ../reproducible_research.bib
categories: ["r", "gis"]
tags: ["gis", "r", "open science"]
output:
md_document:
preserve_yaml: true
variant: gfm
---

Some inspiration for this post came from the beautiful books of Lovelace
*et al.* (2019), Pebesma & Bivand (2019) and Hijmans (2019), and from
various websites.

## Why use open standards?

- Open file standards ease collaboration, portability and
compatibility between users, machines and applications.
- Their (file) structure is fully documented.
- Consequently, scientists and programmers can build new software
/ packages and make innovations that use these standards, while
maintaining interoperability with existing applications.
- And, it guarantees that your data will still be readable in a
hundred years from now, independently of which IT corporation or
software is dominant at that time…

Luckily, quite a list of open standards is available\! Below, some
powerful and widely-used single-file formats are introduced. Single-file
data sources are readily amenable to exchange and publication.

I see you can’t wait to start practicing, so you can also head straight
over to the [tutorial on vector
formats](../../tutorials/spatial_standards_vector/) and the [tutorial on
the GeoTIFF raster format](../../tutorials/spatial_standards_raster/)\!

## A few words on the GDAL library

**[GDAL](https://gdal.org)** (Geospatial Data Abstraction Library) is by
far the most used collection of open-source drivers for:

- [a lot](https://gdal.org/drivers/raster/index.html) of raster
formats;
- [a lot](https://gdal.org/drivers/vector/index.html) of vector
formats.

In other words, it is the preferred workhorse for reading and writing
many geospatial file formats, used in the background by [a
lot](https://gdal.org/software_using_gdal.html#software-using-gdal) of
geospatial applications. Using GDAL is the easiest way to conform to
open standards.

So, in R we use packages that use GDAL in the background, such as
`rgdal`, `sp`, `sf`, `raster` and `stars`.

## The GeoPackage file format

- Its website is <https://www.geopackage.org>.
- It is a standardized implementation of an SQLite database for
geospatial data. Hence, a GeoPackage is a **binary** file
(`filename.gpkg`). It shares this property with shapefiles, which
however pose multiple limitations,\[1\] so the GeoPackage is a more
than suitable replacement.
- The GeoPackage can store one or *multiple* **vector** layers
(points, lines, polygons and related feature types). Besides vector
data, it can also store **raster** data or extra standalone
**tables**.
- The GeoPackage standard is
[maintained](https://www.opengeospatial.org/standards/geopackage) by
the [Open Geospatial Consortium](https://www.opengeospatial.org/)
(OGC), which stands out as a reference when it comes to open
geospatial standards.

## The GeoJSON file format

- One [GeoJSON](https://tools.ietf.org/html/rfc7946) file
(`filename.geojson`) contains *one* **vector** layer.
[JSON](https://en.wikipedia.org/wiki/JSON) itself is a common and
straightforward open data format. It is a **text** file readable
both by humans and machines (see the
[tutorial](../../tutorials/spatial_standards_vector/) for an
example). GeoJSON adds the necessary specification to JSON for
standardized storage of geographic feature data, but it is still a
plain JSON text file.
- The GeoJSON standard is maintained by the Internet Engineering Task
Force ([IETF](https://www.ietf.org/)), a large open standards
organization that develops Internet standards under the auspices of
the Internet Society.
- Although the previous version of the GeoJSON standard – GeoJSON 2008
– is still a lot in use, it is
[obsoleted](http://geojson.org/geojson-spec.html) and a new version
**[RFC7946](https://tools.ietf.org/html/rfc7946)** is establishing.
- This version is strict about the coordinate reference system
(CRS) – it is always [WGS84](https://epsg.io/4326) – and it also
differs on a few other aspects (such as the recommendation for
applications [not to
inflate](https://tools.ietf.org/html/rfc7946#section-11.2)
decimal coordinate precision).
- RFC7946 solves the problem that quite a few libraries –
including GDAL – simply assumed WGS84 in GeoJSON 2008 (without
checking or transforming), even though WGS84 was not a
requirement of GeoJSON 2008 (it did support an explicit *crs*
declaration). This resulted in inconveniences (cf. [this
post](https://github.com/r-spatial/sf/issues/344#issue-229118527)
in the `sf`-package).
- A [specific
section](https://gdal.org/drivers/vector/geojson.html#rfc-7946-write-support)
in the documentation of GDAL’s GeoJSON driver gives a summary of
the differences between both GeoJSON versions.
- While GDAL by default still follows the GeoJSON 2008 format,\[2\]
RFC7946 is supported by the option `RFC7946=YES`. Here, on-the-fly
reprojection to WGS84 will happen automatically. It applies 7
decimal places for coordinates, i.e. approximately 1 cm. Given the
advantages, ***we advise to explicitly use RFC7946***. Several
functions in R allow the user to provide options that are passed to
GDAL, so we can ask to deliver RFC7946 (see the
[tutorial](../../tutorials/spatial_standards_vector/)).
- In order to keep it manageable (text file size, usage in versioning
systems) it can be wise to use GeoJSON for more simple cases (points
and rather simple lines and polygons), and use the binary GeoPackage
format for larger (more complex) cases.

## The GeoTIFF file format

- [GeoTIFF](https://en.wikipedia.org/wiki/GeoTIFF) is the preferred
single-file open standard for **raster** data. It adheres to the
open [TIFF](https://en.wikipedia.org/wiki/TIFF) specification; hence
it is a TIFF image file (`filename.tif`). It
[uses](http://docs.opengeospatial.org/is/19-008r4/19-008r4.html#_geotiff_file_structure_and_geotiff_crs_and_models_principles_informative)
a small set of reserved TIFF tags to store information about CRS,
extent and resolution of the raster.
- A GeoTIFF file can contain *one* or *multiple* rasters with the same
CRS, extent and resolution.
- The GeoTIFF standard is
[maintained](https://www.opengeospatial.org/standards/geotiff) by
the [Open Geospatial Consortium](https://www.opengeospatial.org/)
(OGC), which stands out as a reference when it comes to open
geospatial standards.

## Literature

<div id="refs" class="references">

<div id="ref-heijmans_spatial_2019">

Hijmans R. (2019). Spatial Data Science with R. URL:
<https://rspatial.org/>.

</div>

<div id="ref-lovelace_geocomputation_2019">

Lovelace R., Nowosad J. & Muenchow J. (2019). Geocomputation with R.
URL: <https://geocompr.robinlovelace.net>.

</div>

<div id="ref-pebesma_edzer_spatial_2019">

Pebesma E. & Bivand R. (2019). Spatial Data Science. URL:
<https://www.r-spatial.org/book>.

</div>

</div>

1. Some problems with shapefiles are: they’re not an open format, they
consist of multiple files and they have restrictions regarding file
size, column name length, number of columns and the feature types
that can be accommodated.

2. Though GeoJSON 2008 is obsoleted, the now recommended RFC7946
standard is still officially in a *proposal* stage. That is probably
the reason why GDAL does not yet default to RFC7946. A somehow
confusing stage, it seems.
24 changes: 24 additions & 0 deletions content/articles/reproducible_research.bib
Original file line number Diff line number Diff line change
Expand Up @@ -772,3 +772,27 @@ @book{bryan_happy_2019
annote = {Getting started with git and github workflows in RStudio}
}


@book{lovelace_geocomputation_2019,
title = {Geocomputation with {{R}}},
url = {https://geocompr.robinlovelace.net},
author = {Lovelace, Robin and Nowosad, Jakub and Muenchow, Jannes},
year = {2019}
}


@book{pebesma_edzer_spatial_2019,
title = {Spatial {{Data Science}}},
url = {https://www.r-spatial.org/book},
author = {Pebesma, Edzer and Bivand, Roger},
year = {2019}
}


@book{heijmans_spatial_2019,
title = {Spatial {Data} {Science} with {R}},
url = {https://rspatial.org/},
urldate = {2019-11-20},
author = {Hijmans, Robert},
year = {2019}
}