/
README.Rmd
223 lines (155 loc) · 7.14 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
# knitr::opts_chunk$set(
# collapse = TRUE,
# comment = "#>",
# fig.path = "README-"
# )
```
# emstreeR
<!-- # emstreeR <img src="man/figures/logo.png" align="right" /> -->
<!-- [![Downloads](http://cranlogs.r-pkg.org/badges/emstreeR?color=brightgreen)](http://www.r-pkg.org/pkg/emstreeR) -->
<!-- one space after links to display badges side by side -->
<!-- badges: start -->
<!-- [![Travis-CI Build Status](https://travis-ci.org/allanvc/emstreeR.svg?branch=master)](https://travis-ci.org/allanvc/emstreeR) -->
[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/emstreeR)](https://cran.r-project.org/package=emstreeR)
[![Downloads from the RStudio CRAN mirror](https://cranlogs.r-pkg.org/badges/grand-total/emstreeR)](https://cran.r-project.org/package=emstreeR)
[![License](https://img.shields.io/badge/License-BSD3--Clause-blue?style=flat-square)](https://opensource.org/license/bsd-3-clause/)
[![R-CMD-check](https://github.com/allanvc/emstreeR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/allanvc/emstreeR/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
## Overview
`emstreeR` enables **R** users to fast and easily compute an Euclidean Minimum Spanning Tree (EMST) from data. This package relies on the R API for {mlpack} - the C++ Machine Learning Library (Curtin et. al., 2013). {emstreeR} uses the Dual-Tree Boruvka (March, Ram, Gray, 2010, <https://doi.org/10.1145/1835804.1835882>), which is theoretically and empirically the fastest algorithm for computing an EMST. This package also provides functions and an S3 method for readily plotting Minimum Spanning Trees (MST) using either the style of the {base}, {scatterplot3d}, or {ggplot2} libraries; and functions to export the MST output to shapefiles.
* `computeMST()` computes an Euclidean Minimum Spanning Tree for the input data.
* `plot.MST()` an S3 method for the generic function `plot()` that produces 2D MST plots.
* `plotMST3D()` plots a 3D MST using the {scatterplot3d} style.
* `stat_MST()` a {ggplot2} Stat extension for plotting a 2D MST.
* `export_vertices_to_shapefile()` writes a shapefile containing the `MST` vertices.
* `export_edges_to_shapefile()` writes a shapefile containing the `MST` edges.
## Installation
```{r, eval = FALSE}
# CRAN version
install.packages("emstreeR")
# Dev version
if (!require('devtools')) install.packages('devtools')
devtools::install_github("allanvc/emstreeR")
```
## Basic Usage
```{r, message = FALSE}
## artificial data:
set.seed(1984)
n <- 7
c1 <- data.frame(x = rnorm(n, -0.2, sd = 0.2), y = rnorm(n, -2, sd = 0.2))
c2 <- data.frame(x = rnorm(n, -1.1, sd = 0.15), y = rnorm(n, -2, sd = 0.3))
d <- rbind(c1, c2)
d <- as.data.frame(d)
## MST:
library(emstreeR)
out <- ComputeMST(d)
out
```
## Plotting
### 2D Plots
```{r, message = FALSE}
## artifical data for 2D plots:
set.seed(1984)
n <- 15
c1 <- data.frame(x = rnorm(n, -0.2, sd = 0.2), y = rnorm(n, -2, sd = 0.2))
c2 <- data.frame(x = rnorm(n, -1.1, sd = 0.15), y = rnorm(n, -2, sd = 0.3))
d <- rbind(c1, c2)
d <- as.data.frame(d)
## MST:
library(emstreeR)
out <- ComputeMST(d, verbose = FALSE)
```
```{r base, message = FALSE, fig.height=4, fig.width=6, eval=FALSE}
## simple 2D plot:
plot(out, col.pts = "red", col.segts = "blue")
```
<img src="man/README-figures/base-1.png" width="650" height="500">
```{r ggplot, message = FALSE, fig.height=4, fig.width=6, eval=FALSE}
## 2D plot with ggplot2:
library(ggplot2)
ggplot(data = out, aes(x = x, y = y, from = from, to = to))+
geom_point()+
stat_MST(colour="red")
```
<img src="man/README-figures/ggplot-1.png" width="600" height="400">
```{r ggplot_curved, message = FALSE, fig.height=4, fig.width=6, eval = FALSE}
## 2D curved edges plot with ggplot2:
library(ggplot2)
ggplot(data = out, aes(x = x, y = y, from = from, to = to))+
geom_point()+
stat_MST(geom="curve")
```
<img src="man/README-figures/ggplot_curved-1.png" width="600" height="400">
### 3D Plot
```{r, message = FALSE}
## artificial data for 3D plots:
n = 99
set.seed(1984)
d1 <- matrix(rnorm(n, mean = -2, sd = .5), n/3, 3) # 3d
d2 <- matrix(rnorm(n, mean = 0, sd = .3), n/3, 3)
d3 <- matrix(rnorm(n, mean = 3, sd = .4), n/3, 3)
d <- rbind(d1,d2,d3) # showing a matrix input
## MST:
library(emstreeR)
out <- ComputeMST(d, verbose = FALSE)
```
```{r scatterplot3d, message = FALSE, fig.height=4, fig.width=6, eval=FALSE}
## simple 3D plot:
plotMST3D(out, xlab = "xaxis", col.pts = "orange", col.segts = "red", main = "a simple MST 3D plot")
```
<img src="man/README-figures/scatterplot3d-1.png" width="600" height="400">
### Exporting the Output to GIS Shapefiles
```{r, message = FALSE, eval=FALSE}
## mock data
country_coords_txt <- "
1 3.00000 28.00000 Algeria
2 54.00000 24.00000 UAE
3 139.75309 35.68536 Japan
4 45.00000 25.00000 'Saudi Arabia'
5 9.00000 34.00000 Tunisia
6 5.75000 52.50000 Netherlands
7 103.80000 1.36667 Singapore
8 124.10000 -8.36667 Korea
9 -2.69531 54.75844 UK
10 34.91155 39.05901 Turkey
11 -113.64258 60.10867 Canada
12 77.00000 20.00000 India
13 25.00000 46.00000 Romania
14 135.00000 -25.00000 Australia
15 10.00000 62.00000 Norway"
d <- read.delim(text = country_coords_txt, header = FALSE,
quote = "'", sep = "",
col.names = c('id', 'lon', 'lat', 'name'))
## MST
library(emstreeR)
output <- ComputeMST(d[,2:3])
#plot(output)
export_vertices_to_shapefile(output, file="vertices.shp")
export_edges_to_shapefile(output, file="edges.shp")
```
Below is an example of how to open the shapefiles using the QGIS software in Ubuntu.
Open file `vertices.shp` or `edges.shp`.
<img src="man/README-figures/qgis1.png" width="750" height="400">
<img src="man/README-figures/qgis2.png" width="750" height="400">
Then go to `Menu > Layer > Add Layer > Add Vector Layer`.
<img src="man/README-figures/qgis3.png" width="750" height="400">
Select `Source Type` as `File` if it is not selected yet. Then click on the three dots button under `Source` to select the other shapefile, depending on which one you used to open QGIS. In the example below, we select `vertices.shp` as we chose `edges.shp` first.
<img src="man/README-figures/qgis4.png" width="750" height="400">
<img src="man/README-figures/qgis5.png" width="750" height="400">
Hit `Add`, then `Close` and __voilà__.
<img src="man/README-figures/qgis6.png" width="750" height="400">
It is then very straightforward to add other layers such map shapefiles or add the generated EMST to existing layers.
## License
This package is licensed under the terms of the BSD 3-clause License.
## References
March, W. B., and Ram, P., and Gray, A. G. (2010). *Fast euclidian minimum
spanning tree: algorithm analysis, and applications*. 16th ACM SIGKDD
International Conference on Knowledge Discovery and Data mining, July
25-28 2010. Washington, DC, USA.
Curtin, R. R. et al. (2013). Mlpack: A scalable C++ machine learning
library. *Journal of Machine Learning Research*, v. 14, 2013.