/
Weather_station_daily_to_grid.Rmd
354 lines (261 loc) · 14.8 KB
/
Weather_station_daily_to_grid.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
---
title: "Interpolate weather station data"
output:
html_document:
self_contained: true
toc: true
toc_float:
collapsed: false
smooth_scroll: false
---
<!-- saved from url=(0014)about:internet -->
<!-- This needs to be manually added to the second line of the html for IE and Edge compatibility -->
```{r setup, include=FALSE}
# This chunk installs any necessary packages, loads them into R, and creates a temporary output directory inside the user's home directory for output of this process.
knitr::opts_chunk$set(echo=T, warning=F, message=F)
options(width = 2000)
# Install any needed packages.
packs = c("doParallel",
"foreach",
"tidyverse",
"maps",
"DT",
"gstat",
"raster")
installed = installed.packages()
for(i in packs){
if(!i %in% rownames(installed)){ install.packages(i, dependencies = T) }
}
library(doParallel)
library(foreach) # both of these for parallel computation
library(tidyverse)
library(maps) # for mapping base layers
library(DT) # for datatable
library(gstat) # For kriging
library(raster) # masks several functions from dplyr
usedir <- normalizePath("~/Interpolate_Weather_Temp")
if(length(dir(usedir))==0){ system(paste('mkdir -p', usedir)) }
knitr::opts_knit$set(root.dir = usedir)
# setwd(usedir) # run this if testing by running each chunk separately. Leave commented for knitting
```
## Overview
This document provides a working example for how to carry out spatial interpolation in R. This process shows how to take daily weather station data and produce gridded, daily data from a smooth raster layer for each day, each weather variable of interest.
This document relies substantially on this resource:
- http://rspatial.org/analysis/rst/4-interpolation.html
See these ESRI resources for the ArcGIS approaches to this task:
- http://desktop.arcgis.com/en/arcmap/10.5/tools/spatial-analyst-toolbox/understanding-interpolation-analysis.htm#
- https://pro.arcgis.com/en/pro-app/tool-reference/3d-analyst/comparing-interpolation-methods.htm
### Requirements to run this yourself
This is an interactive document written in [RMarkdown](https://rmarkdown.rstudio.com/) and contains all the necessary code to run on a machine with R installed. Use of RStudio is recommended; this IDE is available [here](https://www.rstudio.com/) or as part of [Anaconda](https://www.anaconda.com/distribution/).You will need a connection to the internet and ability to write to your home directory.
This code and example data are available here on [GitHub](https://github.com/dflynn-volpe/Interpolate_Weather_Volpe), feel free to modify and reproduce with attribution.
## Getting data in
R can read in data from any number of sources, including Oracle or SQLite database, APIs, or flat files. Here I'm providing weather station data for the state of Tennessee, from the Global Historical Climate Network data, made available through [NOAA](https://www.ncdc.noaa.gov/cdo-web/search?datasetid=GHCND). The data are in .csv format.
These data have already been prepared to some extent, to select only a few variables, and to join with lat/long and elevation from a general station information data table.
```{r inputdata}
# Define the projection we want to use
proj.USGS <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
# Grab files from GitHub
# Read in weather station data from GHCN
wx <- read.csv("https://github.com/dflynn-volpe/Interpolate_Weather_Volpe/blob/master/TN_Station_Data_2017-04-01_to_2018-03-31.csv?raw=true")
zipget = httr::GET('https://github.com/dflynn-volpe/Interpolate_Weather_Volpe/blob/master/TN_01dd_fishnet.zip?raw=true')
writeBin(httr::content(zipget, "raw"), 'TN_01dd_fishnet.zip')
# Format date and add a month field
wx <- wx %>%
mutate(DATE = as.Date(as.character(DATE))) %>%
mutate(mo = format(DATE, "%m"))
# Download grid shapefile .zip from GitHub and unzip into the working directory
unzip("TN_01dd_fishnet.zip")
# Make weather station data into a spatial points data frame
wx.proj <- SpatialPointsDataFrame(wx[c("lon", "lat")],
wx,
proj4string = CRS("+proj=longlat +datum=WGS84"))
wx.proj <- spTransform(wx.proj, CRS(proj.USGS))
# Read in grid
grid_shp <- rgdal::readOGR(usedir, layer = "TN_01dd_fishnet")
grid_shp <- spTransform(grid_shp, CRS(proj.USGS))
```
Let's take a quick look at the data. These interactive tables are produced with the `DT` package, which calls the [DataTables](https://datatables.net/) JavaScript library.
```{r view_data}
datatable(wx[1:5000,], caption = 'Weather station data (first 5,000 rows only)')
```
```{r view_grid_data}
datatable(grid_shp@data)
```
## Plot point data
Now that we have the data in, we can plot the point data before interpolation.
Here we plot the average daily maximum temperature for each station, for the month of June 2017, as well as the total precipitation recorded for each station. We also show the grid layer we eventually want to interpolate to.
This involves the following steps:
- Making summarized data tables for the variables of interest. The code below uses the [`dplyr`](https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html) approach to data preparation.
- Project these data tables as spatial objects with the target coordinate reference system
- Plot the grid layer as a background
- Add the point data with an appropriate color scheme
```{r point_plots}
wx.jun.summary <- wx %>%
group_by(STATION, lon, lat) %>%
filter(mo == "06") %>%
summarize(avgtempmax = mean(TMAX, na.rm=T),
avgtempmin = mean(TMIN, na.rm=T),
sumprecip = sum(PRCP)
)
wx.jun.proj <- spTransform(SpatialPointsDataFrame(wx.jun.summary[c("lon", "lat")],
wx.jun.summary,
proj4string = CRS("+proj=longlat +datum=WGS84")), CRS(proj.USGS))
plot(grid_shp, col = 'lightgrey')
tempcol <- colorRampPalette(c("purple", "blue", "green", "yellow", "orange", "red"))
cuts = cut(wx.jun.proj$avgtempmax, 12)
points(wx.jun.proj$lon, wx.jun.proj$lat,
col = scales::alpha(tempcol(12)[cuts], 0.7), pch = 16, cex = 2)
legend("bottomright", pch = 16, col = tempcol(12),
legend = levels(cuts),
cex = 0.8, ncol = 4, pt.cex = 2)
title(main = "June 2018 average high temperatures")
plot(grid_shp, col = 'lightgrey')
preccol <- colorRampPalette(c("white", "bisque", "green", "cornflowerblue", "blue", "purple"), alpha = T)
cuts = cut(wx.jun.proj$sumprecip, 12)
points(wx.jun.proj$lon, wx.jun.proj$lat,
col = scales::alpha(preccol(12)[cuts], 0.7), pch = 16, cex = 2)
legend("bottomright", pch = 16, col = preccol(12),
legend = levels(cuts),
cex = 0.8, ncol = 4, pt.cex = 2)
title(main = "Total June precipitation")
```
## Interpolation
There are a number of options for spatial interpolation from point to raster. These include nearest neighbor interpolation, inverse distance weighted (IDW), and ordinary kriging.
We will make one raster for each variable of interest, per day, and then apply to grid/hour.
Here we use kriging from the package [`gstat`](https://cran.r-project.org/web/packages/gstat/vignettes/gstat.pdf). Models are all based on spatial variance of the target variable.
The steps in brief are as follows:
- Create an empty raster grid for the state; we will interpolate over this grid, then assign values from the raster to each grid cell. Increase `n` for smaller raster cells (more time-intensive)
- For each target variable, each data, calculate the semi-variance of the data. This shows how much spatial autocorrelation there is between the sample points ([ArcGIS reference](https://pro.arcgis.com/en/pro-app/help/analysis/geostatistical-analyst/understanding-a-semivariogram-the-range-sill-and-nugget.htm))
- Fit a modeled variogram to the data ([ArcGIS reference](https://pro.arcgis.com/en/pro-app/help/analysis/geostatistical-analyst/fitting-a-model-to-the-empirical-semivariogram.htm))
- Krige the data to the raster surface ([ArcGIS reference](https://pro.arcgis.com/en/pro-app/help/analysis/geostatistical-analyst/understanding-ordinary-kriging.htm))
- Extract values from the raster layer to the grid spatial polygon layer and produce a data frame of the interpolated values for the target variable, for the day
The last step is the most computationally-intensive one. If the end product you need is just the raster layer, these steps are very fast. In order to speed things along, this code shows how to implement the steps in parallel to take full advantage of your machine's cores.
```{r interpolation}
grd <- as.data.frame(spsample(grid_shp, 'regular', n = 5000))
names(grd) <- c("X", "Y")
coordinates(grd) <- c("X", "Y")
gridded(grd) <- TRUE # for SpatialPixel
fullgrid(grd) <- TRUE # for SpatialGrid
proj4string(grd) <- proj4string(grid_shp)
StartTime <- Sys.time()
writeLines(c(""), paste0("Prep_Weather_log.txt"))
# This step sets up the parallel process. In combination with foreach() below, it provides a simple way to take advantage of multiple cores.
cl <- makeCluster(parallel::detectCores())
registerDoParallel(cl)
# Can limit to a few select months with this code, or manually set 'do.months' to a value like '2017-06'.
do.months = c(paste('2017', formatC(4:12, width = 2, flag = 0), sep = '-'),
paste('2018', formatC(1:3, width = 2, flag = 0), sep = '-'))
# Start with one example month, comment out this line to run over whole data set (time intensive!)
do.months = c('2017-06')
# Choose one day to save an example raster image for each target variable
ex.day = '2017-06-01'
all_wx_days = unique(wx$DATE)
all_wx_ym = format(all_wx_days, "%Y-%m")
use_wx_days = all_wx_days[all_wx_ym %in% do.months]
# use_wx_days = use_wx_days[1:4] # for even smaller test of four days
# Start loop ----
wx.grd.day <- foreach(day = use_wx_days,
.packages = c('raster','gstat','dplyr','rgdal'),
.combine = rbind) %dopar% {
# day = use_wx_days[1]
dayStartTime <- Sys.time()
wx.day = wx.proj[wx.proj$DATE == day,]
### Precipitation
f.p <- as.formula(PRCP ~ 1)
vg_prcp <- gstat::variogram(PRCP ~ 1, locations = wx.day[!is.na(wx.day$PRCP),])
dat.fit <- fit.variogram(vg_prcp, fit.ranges = F, fit.sills = F,
vgm(model = "Sph"))
# plot(vg_prcp, dat.fit) # Plot the semi variogram.
dat.krg.prcp <- krige(f.p, wx.day[!is.na(wx.day$PRCP),], grd, dat.fit)
# Rasterize
prcp_r <- raster::raster(dat.krg.prcp)
prcp_r <- mask(prcp_r, grid_shp)
if(day == ex.day) { save(prcp_r, file = file.path(usedir, 'example_prcp_raster.RData')) }
### Daily high temperatures
f.tmax <- as.formula(TMAX ~ 1)
vg_tmax <- variogram(TMAX ~ 1, wx.day[!is.na(wx.day$TMAX),])
dat.fit <- fit.variogram(vg_tmax, fit.ranges = F, fit.sills = F,
vgm(model = "Sph"))
dat.krg.tmax <- krige(f.tmax, wx.day[!is.na(wx.day$TMAX),], grd, dat.fit)
tmax_r <- raster::raster(dat.krg.tmax)
tmax_r <- mask(tmax_r, grid_shp)
if(day == ex.day) { save(tmax_r, file = file.path(usedir, 'example_tmax_raster.RData')) }
# Apply to grid cells in year-day ----
# This is the most time-intensive step in the process.
# Need to extract values from the raster layers to the polygons
prcp_extr <- raster::extract(x = prcp_r, # Raster object
y = grid_shp, # SpatialPolygons
fun = mean,
df = TRUE)
names(prcp_extr)[2] = "PRCP"
prcp_extr$ID = grid_shp$GRID_ID
daily_result <- data.frame(day, prcp_extr)
tmax_extr <- raster::extract(x = tmax_r, # Raster object
y = grid_shp, # SpatialPolygons
fun = mean,
df = TRUE)
names(tmax_extr)[2] = "TMAX"
tmax_extr$ID = grid_shp$GRID_ID
daily_result <- full_join(daily_result, tmax_extr)
dayEndTime <- Sys.time() - dayStartTime
cat(as.character(day), 'completed', round(dayEndTime, 2), attr(dayEndTime, 'units'), '\n',
file = paste0("Prep_Weather_log.txt"), append = T)
daily_result
} # end parallel loop
EndTime <- Sys.time()-StartTime
cat('Entire process completed in', round(EndTime, 2), attr(EndTime, 'units'), '\n',
file = paste0("Prep_Weather_log.txt"), append = T)
# Save the ouput
save(list = c('wx.grd.day'),
file = file.path(usedir, 'Interpolated_Gridded_Daily_Data.RData'))
```
## Plot raster interpolation
We can view some of intermediate raster surfaces which were produced by the interpolation. Here looking at the TMAX and PRCP surfaces for one example day in June 2017:
```{r raster_plots}
load('example_prcp_raster.RData')
load('example_tmax_raster.RData')
plot(tmax_r, main = 'Interpolated high temperature for example day')
plot(prcp_r, main = 'Interpolated precipitation for example day')
```
## Plot gridded data
Finally, let's visualize the output of this work. Similar to the point data plots, we make summary data tables for June 2017, and plot the average daily high temperatures and sum of precipitation. Now the values are applied to each grid cell.
```{r grid_plots}
# Start from already prepared data if run previously
load(file.path(usedir, 'Interpolated_Gridded_Daily_Data.RData'))
wx.grd.day$mo = format(wx.grd.day$day, "%m")
grd.jun.summary <- wx.grd.day %>%
group_by(ID) %>%
filter(mo == "06") %>%
summarize(avg.06.tempmax = mean(TMAX, na.rm=T),
sumP = sum(PRCP, na.rm=T))
par(mfrow=c(1,2))
hist(grd.jun.summary$avg.06.tempmax,
main = "Average TMAX by grid cell for June 2017",
xlab = "TMAX",
col = "tomato")
hist(grd.jun.summary$sumP,
main = "Sum PRCP by grid cell for June 2017",
xlab = "PRCP",
col = "cornflowerblue")
# Join to grid cells, make analogous plots
grd.jun.summary$ID <- as.character(grd.jun.summary$ID)
plotgrid <- grid_shp
plotgrid@data <- left_join(plotgrid@data, grd.jun.summary, by = c("GRID_ID"="ID"))
plotgrid@data[plotgrid@data==-Inf] = NA
# Make tempmax and sumP maps
tempcol <- colorRampPalette(c("purple", "blue", "green", "yellow", "orange", "red"))
par(mfrow=c(1,1))
cuts = cut(plotgrid@data$avg.06.tempmax, 12)
plot(plotgrid, col = tempcol(12)[cuts], border = NA)
legend("bottomright", pch = 16, col = tempcol(12),
legend = levels(cuts),
cex = 0.8, ncol = 3, pt.cex = 2)
title(main = "Average June high temperatures")
preccol <- colorRampPalette(c("white", "bisque", "green", "cornflowerblue", "blue", "purple"))
cuts = cut(plotgrid@data$sumP, 12)
plot(plotgrid, col = preccol(12)[cuts], border = NA)
legend("bottomright", pch = 16, col = preccol(12),
legend = levels(cuts),
cex = 0.8, ncol = 3, pt.cex = 2)
title(main = "Sum of precipitation across the study period")
```