Skip to content

Correcting inferences in volunteer data using geospatial covariates

License

Notifications You must be signed in to change notification settings

sodascience/night_globe

Repository files navigation

Correcting inferences for volunteer-collected data with geospatial sampling bias

DOI

Repository containing reproducible code belonging to the manuscript "Correcting inferences for volunteer-collected data with geospatial sampling bias" by Peter Lugtig, Annemarie Timmers, and Erik-Jan van Kesteren

Reproducing the analysis

This is an R project with packages managed by renv, on R version 4.1.2. Clone or download this repository, enter the folder and open the project file (night_globe.Rproj) in RStudio. Then, if you have installed the renv package, run renv::restore() to obtain the right versions of all packages used.

The starting point is the file 01_data_loading.R

Extended description

The Globe at Night project contains volunteer-collected data about the brightness of the sky. For example, in Pennsylvania in 2020, the following observations were made:

If researchers want to make inferences about how bright the night sky is in Pennsylvania, it would be optimal to observe the night sky at random locations in the state. The volunteer data is not randomly distributed: there is sampling bias. In this repository, we correct for such sampling bias by using geospatial covariates and geospatial models to predict sky brightness throughout Pennsylvania.

As covariates, we use moon illumination, cloud cover as well as land use data from https://www.mrlc.gov/.

In addition, we have information about where the main roads lie in Pennsylvania. This is data from OpenStreetMaps:

Using different models with increasing levels of complexity we obtain the following predicted sky brightness values:

We validate these internally, using leave-one-out cross-validation. There, we conclude that the most complex model (model 8, with all covariates and kriging) leads to the best out-of-sample prediction performance.

We can additionally externally validate these models by comparing them against (log-)skyglow measurements derived from satellite imagery:

Contact

This project is developed and maintained by the ODISSEI Social Data Science (SoDa) team.

SoDa logo

Do you have questions, suggestions, or remarks? File an issue in the issue tracker or feel free to contact Erik-Jan van Kesteren (@ejvankesteren)