/
Chap4_Lab2.Rnw
executable file
·293 lines (226 loc) · 16.8 KB
/
Chap4_Lab2.Rnw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
\documentclass[b5paper,11pt]{book}
\input{preamble}
\ifthenelse{\equal{\jobname}{\detokenize{Chap4_Lab2}}}
{\standalonetrue}
{\standalonefalse}
\begin{document}
<<echo=FALSE>>=
opts_chunk$set(background = rgb(1,1,0.85),
size='small')
@
\addcontentsline{toc}{section}{Lab 4.2: Spatio-Temporal Basis Functions with {\bf FRK}}
\section*{Lab 4.2: Spatio-Temporal Basis Functions with {\bf FRK}}
\markright{Lab 4.2: Spatio-Temporal Basis Functions with {\bf FRK}}
In this Lab we shall focus on modeling the maximum temperature in July 1993 from data in the NOAA data set using spatio-temporal basis functions. The packages we need are the following:
<<message=FALSE,results='hide',warning=FALSE>>=
library("dplyr")
library("FRK")
library("ggplot2")
library("gstat")
library("RColorBrewer")
library("sp")
library("spacetime")
library("STRbook")
library("tidyr")
@
<<echo=FALSE,message=FALSE,results='hide',warning=FALSE>>=
library("grid")
library("gridExtra")
library("sp")
sp::set_Polypath(FALSE)
@
%% GENERATE BASIS FUNCTIONS FIGURE
<<Basis_functions,echo=FALSE,fig.keep="all",fig.height=3,warning=FALSE>>=
df <- data.frame(s = seq(-1,1,length=100)) %>%
mutate(Cosine = cos(s*2*pi)*0.5,
Gaussian = exp(-s^2/(2*0.2^2)),
Wendland = fields::Wendland(abs(s),theta=0.6,dimension=1,k=2),
Bisquare = (1 - (s/0.5)^2)^2 * (abs(s) < 0.5),
Mexican_Hat = 0.9*pi^(-0.25)*(1-(s*4)^2)*exp(-(s*4)^2/2),
Linear_Element = as.numeric((1-2*abs(s))* I(abs(s) < 0.5))) %>%
gather(name,phi,-s)
g <- ggplot(df) + geom_line(aes(s,phi)) + facet_grid(~name) +
theme(axis.text.x=element_blank()) + theme_bw() + ylab(expression(phi(s)))
ggsave(g, file="./img/Chapter_4/spat_basis_fn.png",height=2.5,width=8)
@
The package {\bf FRK} implements a low-rank approach to spatial and spatio-temporal modeling known as \emph{fixed rank kriging} (FRK). \ifstandalone FRK considers the random-effects model \begin{equation}
Y(\bs;t) = \bx(\bs;t)'\bfbeta + \sum_{i=1}^{n_\alpha} \phi_{i}(\bs;t) \alpha_i + \nu(\bs;t),
\label{eq:Yprocess_stbasis}
\end{equation}
where $\{\phi_{i}(\bs;t): i=1,\ldots,n_\alpha \}$ are specified basis functions evaluated at space-time location $(\bs;t)$, $\{\alpha_i : i=1,\ldots,n_\alpha \}$ are random effects, and $\nu(\bs;t)$ captures small-scale spatio-temporal variation not absorbed by the summation term. The random effects $\bfalpha \sim Gau({\mbf 0},\bC_\alpha)$, where $\bfalpha \equiv (\alpha_1,\ldots,\alpha_{n_\alpha})'$. Assume we are interested in the $Y$-process at $n_y$ spatio-temporal locations, which we denote by the vector $\bY$. The process model then becomes
\begin{equation}
\bY = \bX \bfbeta + \bfPhi \bfalpha + \bfnu,
\label{eq:Yprocess_basis_vec}
\end{equation}
where $i$th column of the $n_y \times n_\alpha$ matrix $\bfPhi$ corresponds to the $i$th basis function, $\phi_{i}(\cdot;\cdot)$ at all of the spatio-temporal locations ordered as given in $\bY$, and the vector $\bfnu$ also corresponds to the spatio-temporal ordering given in $\bY$ such that $\bfnu \sim Gau({\mbf 0},\bC_\nu)$. \else FRK considers the random-effects model \eqref{eq:Yprocess_stbasis}, sometimes known as the spatio-temporal random-effects model \citep{cressie2010fixed}\index[aut]{Cressie, N.}\index[aut]{Shi, T.}\index[aut]{Kang, E.~L.}, and provides functionality to the user for choosing the basis functions $\{\phi_i(\bs;t) : i = 1,\dots,n_\alpha\}$ from the data. \fi
A key difference between {\bf FRK} and other geostatistical packages is that, in {\bf FRK}, modeling and prediction are carried out on a fine, regular discretization of the spatio-temporal domain. The small grid cells are known as \emph{basic areal units} (BAUs), and their primary utility is to account for problems of change of support (varying measurement footprint), which we do not consider in this Lab. \ifstandalone \else The package is loaded by typing in the console
<<echo=TRUE,message=FALSE,results="hide",warning=FALSE>>=
library("FRK")
@
\fi
<<echo=FALSE,message=FALSE,results="hide",warning=FALSE>>=
library("FRK")
@
For spatio-temporal modeling and prediction, {\bf FRK} requires the user to provide the point-level data as objects of class \texttt{STIDF} \ifstandalone \else (see p.\ \pageref{pg:ST-defs})\fi. Hence, for this exercise, we use \cc{STObj5} from Lab 3.1, which we reconstruct below (for completeness) from \cc{STObj3}.
<<Gstat_modelling,cache=FALSE>>=
data("STObj3", package = "STRbook") # load STObj3
STObj4 <- STObj3[, "1993-07-01::1993-07-31"] # subset time
STObj5 <- as(STObj4[, -14], "STIDF") # omit t = 14
STObj5 <- subset(STObj5, !is.na(STObj5$z)) # remove NAs
@
The spatio-temporal BAUs are constructed using the function \fn{auto\_BAUs} which takes several arguments, as shown below and detailed using the in-line comments. For more details see \fn{help}\cc{(auto\_BAUs)}. Note that as \args{cellsize} we chose \fn{c}\cc{(\num{1},\num{0.75},\num{1})} which indicates a BAU size of 1 degree longitude $\times$ 0.75 degrees latitude $\times$ 1 day -- this choice ensures that the BAUs are similar to the prediction grid used in Lab 3.1. The argument \args{convex} is an ``extension radius'' used in domain construction via the package {\bf INLA}. See the help file of \fn{inla.nonconvex.hull} for details.
<<BAUs,message=FALSE,results='hide',cache=FALSE,echo=TRUE,warning=FALSE>>=
BAUs <- auto_BAUs(manifold = STplane(), # ST field on the plane
type = "grid", # gridded (not "hex")
data = STObj5, # data
cellsize = c(1, 0.75, 1), # BAU cell size
convex = -0.12, # hull extension
tunit = "days") # time unit is "days"
@
The BAUs are of class \cc{STFDF} since they are three-dimensional pixels arranged regularly in both space and in time. To plot the spatial BAUs overlaid with the data locations, we run
<<eval=FALSE>>=
plot(as(BAUs[, 1], "SpatialPixels")) # plot pixel BAUs
plot(SpatialPoints(STObj5),
add = TRUE, col = "red") # plot data points
@
\noindent This generates the left panel of Figure~\ref{fig:BAUs}. The BAUs, which we will also use as our prediction grid, overlap all the data points. The user has other options in BAU construction; for example, the following code generates \emph{hexagonal} BAUs using a convex hull for a boundary.
<<BAUs_hex,message=FALSE,results='hide',cache=FALSE,echo=TRUE,warning=FALSE>>=
BAUs_hex <- auto_BAUs(manifold = STplane(), # model on the plane
type = "hex", # hex (not "grid")
data = STObj5, # data
cellsize = c(1, 0.75, 1), # BAU cell size
nonconvex_hull = FALSE, # convex hull
tunit = "days") # time unit is "days"
@
\noindent Plotting proceeds in a similar fashion, except that the first line in the code chunk above now becomes
<<eval=FALSE>>=
plot(as(BAUs_hex[, 1], "SpatialPolygons"))
@
\noindent This allows for the fact the the BAUs are now (hexagonal) polygons and not rectangular pixels. The resulting plot is shown in the right panel of Figure~\ref{fig:BAUs}.
<<plotBAUs,echo=FALSE,message=FALSE,results='hide',fig.keep='none'>>=
png('./img/Chapter_4/BAUs.png',width=6000,height=2000,res = 400)
xmin <- min(coordinates(BAUs)[,1])
xmax <- max(coordinates(BAUs)[,1])
ymin <- min(coordinates(BAUs)[,2])
ymax <- max(coordinates(BAUs)[,2])
par(mfrow = c(1,2),mar=c(0,0,0,0))
plot(as(BAUs[,1],"SpatialPixels"), xlim = c(xmin, xmax), ylim = c(ymin, ymax), asp = 1)
plot(as(STObj5,"Spatial"),add=TRUE,col="red")
plot(as(BAUs_hex[,1],"SpatialPolygons"), xlim = c(xmin, xmax), ylim = c(ymin, ymax), asp = 1)
plot(as(STObj5,"Spatial"),add=TRUE,col="red")
dev.off()
@
\begin{figure}[t!]
\begin{center}
\includegraphics[width=\textwidth,angle=0]{img/Chapter_4/BAUs.png}
\end{center}
\caption{BAUs constructed for modeling and predicting maximum temperature from data in the NOAA data set. Left: Gridded BAUs arranged within a non-convex hull enclosing the data. Right: Hexagonal BAUs arranged within a convex hull enclosing the data.}
\label{fig:BAUs}
\end{figure}
Next we construct the basis functions $\{\phi_i(\bs;t) : i = 1,\dots,n_\alpha\}$. In {\bf FRK}, these are constructed by taking the tensor product of spatial basis functions with temporal basis functions. Specifically, consider a set of $r_s$ spatial basis functions $\{\phi_{p}(\bs): p = 1,\dots,r_s\}$, and a set of $r_t$ temporal basis functions $\{\psi_{q}(t): q = 1,\dots,r_t\}$. Then we construct the set of spatio-temporal basis functions as $\{\phi_{st,u}(s,t) : u = 1,\dots,r_sr_t\} = \{\phi_{p}(\bs)\psi_{q}(t) : p = 1,\dots,r_s;\ q = 1,\dots,r_t\}$.
\ifstandalone In principle any basis function can be used in {\bf FRK}, although these would need to be defined by the user. Examples of basis functions are given in Figure \ref{fig:spat_basis_fn} \else \fi The generic basis function that {\bf FRK} uses by default is the bisquare function (see Figure~\ref{fig:spat_basis_fn}) given by
\begin{equation*}%\label{eq:bisquare1D}
b(\bs,\bv) \equiv \left\{\begin{array}{ll} \{1 - (\|\bv- \bs\|/r)^2\}^2, &\| \bv -\bs\| \le r, \\
0, & \textrm{otherwise}, \end{array} \right.
\end{equation*}
\noindent where $r$ is the aperture parameter. Basis functions can be either regularly placed, or irregularly placed, and they are often multiresolutional. We choose two resolutions below, yielding $r_s = 94$ spatial basis functions in total, and place them irregularly in the domain. (Note that $r_s$ and the bisquare apertures are determined automatically by \fn{auto\_basis}.)
<<message=FALSE,results='hide'>>=
G_spatial <- auto_basis(manifold = plane(), # fns on plane
data = as(STObj5, "Spatial"), # project
nres = 2, # 2 res.
type = "bisquare", # bisquare.
regular = 0) # irregular
@
\ifstandalone
\begin{figure}[t!]
\begin{center}
\includegraphics[width=\textwidth,angle=0]{img/Chapter_4/spat_basis_fn.png}
\end{center}
\caption{Spatial basis functions commonly employed in spatio-temporal modeling depicted in one-dimensional space. From left to right: The bisquare function, cosine function, Gaussian function, linear element, Mexican-hat wavelet, and the first-order Wendland function.}
\label{fig:spat_basis_fn}
\end{figure}
\fi
Temporal basis functions also need to be defined. We use the function \fn{local\_basis} below to construct a regular sequence of $r_t = 20$ bisquare basis functions between day 1 and day 31 of the month. Each of these bisquare basis functions is assigned an aperture of 2 days; that is, the support of each bisquare function is 4 days. The temporal grid is defined through
<<>>=
t_grid <- matrix(seq(1, 31, length = 20))
@
\noindent The basis functions are constructed using the following commands.
<<>>=
G_temporal <- local_basis(manifold = real_line(), # fns on R1
type = "bisquare", # bisquare
loc = t_grid, # centroids
scale = rep(2, 20)) # aperture par.
@
Finally, we construct the $r_sr_t = 1880$ spatio-temporal basis functions by taking the tensor product of the spatial and the temporal ones, using the function \fn{TensorP}.
<<>>=
G <- TensorP(G_spatial, G_temporal) # take the tensor product
@
\noindent The basis functions \cc{G\_spatial} and \cc{G\_temporal} can be visualized using the plotting function \fn{show\_basis}; see Figure~\ref{fig:ST_basis}. While the basis functions are of tensor-product form, the resulting S-T covariance function obtained from the spatio-temporal random effects model is not separable in space and time.
<<results='hide',message=FALSE,echo=FALSE,fig.keep='none'>>=
g1 <- show_basis(G_spatial) + xlab("lon (deg)") + ylab("lat (deg)") + coord_fixed()
g2 <- show_basis(G_temporal) + xlab("t (days)") +ylab(expression(varphi(t)))
g <- grid.arrange(g1,g2,nrow=1)
ggsave(g, file="./img/Chapter_4/STbasis.png",height=3.2,width=8)
@
\begin{figure}[t!]
\begin{center}
\includegraphics[width=\textwidth,angle=0]{img/Chapter_4/STbasis.png}
\end{center}
\caption{Spatial and temporal basis functions used to construct the spatio-temporal basis functions. Left: Locations of spatial basis functions (circles denote spatial support). Right: Temporal basis functions.}
\label{fig:ST_basis}
\end{figure}
In {\bf FRK}, the fine-scale variation term at the BAU level, \eqref{eq:Yprocess_basis_vec}, is assumed to be Gaussian with covariance matrix proportional to $\diag(\{\sigma^2_{\nu,i}\})$, where $\{\sigma^2_{\nu,i}: i = 1,\dots,n_y\}$ are pre-specified at the BAU level (the constant of proportionality is then estimated by {\bf FRK}). Typically, these are related to some geographically related quantity such as surface roughness. In our case, we simply set $\sigma^2_{\nu,i} = 1$ for all $i$.
<<>>=
BAUs$fs = 1
@
The fine-scale variance at the BAU level is confounded with the measurement-error variance. In some cases, the measurement-error variance is known; when it is not (as in this case), one can carry out a simple analysis to estimate the value of the semivariogram at the origin. In this case, we simply assume that the nugget effect estimated when fitting the separable covariance function in Lab 4.1 is the measurement-error variance -- any residual nugget component is then assumed to be the fine-scale variance introduced as a consequence of the low-rank approximation to the process. The measurement-error variance is specified in the \cc{std} field in the data \cc{ST} object.
<<>>=
STObj5$std <- sqrt(0.049)
@
\noindent The response variable and covariates are identified through a standard \cc{R} formula. In this case we use latitude as a covariate and set
<<>>=
f <- z ~ lat + 1
@
We are now ready to call the main function \fn{FRK}, which estimates all the unknown parameters in the models, including the covariance matrix of the basis-function coefficients and the fine-scale variance. We need to supply the formula, the data, the basis functions, the BAUs, and any other parameters configuring the expectation-maximization (EM) algorithm used for finding the maximum likelihood estimates. To reduce processing time, we have set the number of EM-algorithm steps to 3. Convergence of the EM algorithm can be assessed visually by setting \cc{\args{print\_lik} = \num{TRUE}} below.
<<results='hide',message=FALSE,fig.keep='none'>>=
S <- FRK(f = f, # formula
data = list(STObj5), # (list of) data
basis = G, # basis functions
BAUs = BAUs, # BAUs
n_EM = 3, # max. no. of EM iterations
tol = 0.01) # tol. on change in log-likelihood
@
Once the model is fitted, prediction proceeds via the function \fn{predict}. If the argument \args{newdata} is not specified, then prediction is done at all the BAUs.
<<>>=
grid_BAUs <- predict(S)
@
\noindent The resulting object, \cc{grid\_BAUs}, is also of class \cc{STFDF}, and plotting proceeds as per Lab 4.1 using the \fn{stplot} function. \ifstandalone Plotting of the FRK predictions and prediction standard errors is left as an exercise. \else The resulting predictions and prediction standard errors are illustrated in Figure \ref{fig:pred_FRK}.\fi
<<echo=FALSE,results='hide',fig.keep='none'>>=
grid_BAUs@sp <- SpatialPoints(grid_BAUs@sp) # convert to Spatial points object
gridded(grid_BAUs@sp) <- TRUE # and assert that it is gridded
library(RColorBrewer)
colour_regions <- (brewer.pal(11, "Spectral") %>% # construct spectral brewer palette
colorRampPalette())(16) %>% # at 16 levels
rev() # reverse direction
grid_BAUs$mu_trunc <- pmin(pmax(grid_BAUs$mu,68),105)
png("./img/Chapter_4/FRK_pred.png",width = 1600,height=1000,res=300)
stplot(grid_BAUs[,c(4,9,14,19,24,29),"mu_trunc"], # plot the FRK predictor
main="Predictions (degrees Fahrenheit)", # title
layout=c(3,2), # trellis layout
col.regions=colour_regions, # color scale
xlim=c(-100,-80),ylim=c(32,46), # axes limits
aspect=1) # fixed aspect ratio
dev.off()
@
<<FRK-se,echo=FALSE,results='hide',fig.keep='none'>>=
grid_BAUs$se <- pmax(pmin(sqrt(grid_BAUs$var),4.4),1.8)
png("./img/Chapter_4/FRK_se.png",width = 1600,height=1000,res=300)
stplot(grid_BAUs[,c(4,9,14,19,24,29),"se"], # plot the FRK predictor
main="Prediction std. errors (degrees Fahrenheit)", # title
layout=c(3,2), # trellis layout
col.regions=colour_regions, # color scale
xlim=c(-100,-80),ylim=c(32,46), # axes limits
aspect=1) # fixed aspect ratio
dev.off()
@
\end{document}