The R package gravity.distances
provides aggregate distances between geographic entities — such as countries, US states or Canadian provinces — that are consistent with the gravity equation in international economics. Please check out julianhinz.com/data/#gravity.distances for more information.
Install from Github via the devtools package:
devtools::install_github("julianhinz/gravity.distances")
The get_distance
function provides distances between two geographic entities for a given point in time for a given value θ. The θ is a parameter in the aggregation of the mean that yields the harmonic mean for θ = -1, the geometric mean for θ = 0 and the arithmetic mean for θ = 1. In almost all cases in which a general gravity relationship is assumed to hold, i.e. when the underlying data generating process posits a negative relationship of the variable of interest with distance, the harmonic mean should be used.
To get the distance between two countries, simply set the origin
and destination
argument to the ISO 3166-1 alpha-3 country codes of the two countries. The year
argument defaults to 2012, the theta
argument defaults to -1, i.e. the harmonic mean.
library(gravity.distances)
get_distance("DEU", "CAN")
# [1] 6519.294
Specifying an origin
, destination
and year
delivers the harmonic mean distance (θ = -1) between to countries, here between Germany and Canada for the years between 1992 and 2012.
library(ggplot2)
dist <- data.frame(origin = "DEU", destination = "CAN", year = c(1992:2012))
dist$distance <- get_distance(origin = dist$origin,
destination = dist$destination,
year = dist$year)
ggplot(dist) +
theme_minimal() +
geom_line(aes(x = year, y = distance))
Specifying the theta
s for a given year shows the effect of θ on the aggregate distance. The result is most visible for short distances, e.g. the average distance between two points in Canada. Note that the data
argument needs to be set to distances_from_countries_to_countries
in order to have distances for theta
values between -2 and 1 in 0.1 increments.
dist <- data.frame(origin = "CAN", destination = "CAN", theta = c(-20:10)/10)
dist$distance <- get_distance(origin = dist$origin,
destination = dist$destination,
theta = dist$theta,
data = "distances_from_countries_to_countries")
The data
argument can be used to use different distance datasets.
dist <- expand.grid(origin = "BC", destination = c("BC", "AB", "SK", "MB"), theta = c(-20:10)/10)
dist$distance <- get_distance(origin = dist$origin,
destination = dist$destination,
theta = dist$theta,
data = "distances_from_canada_provinces_to_canada_provinces",
data_store = F)
data_store
can be set toFALSE
in order not to store additional downloaded datasets from the gravity.distances_data repository. Default isTRUE
.data_url
may be specified for custom datasets. Default ishttps://github.com/julianhinz/gravity.distances_data
.code_format
may be set to specify the format oforigin
anddestination
in accordance with the countrycode package. Default isiso3c
.
The remove_data
function removes some or all downloaded additional datasets. The data
argument can be set to a specific dataset to be removed, e.g. data = distances_from_canada_provinces_to_canada_provinces
. The default is data = NULL
, deleting all previously downloaded additional datasets.
- determine required dataset automatically
- fix geometric distances for US states and Canadian provinces
- potentially extend to Python and or Julia, foundation laid with
feather
file-format for additional datasets
- This package is still in its very early stages. Let me know if you find bugs via pull request or e-mail to mail@julianhinz.com.