/
posfun.Rmd
252 lines (195 loc) · 8.42 KB
/
posfun.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
---
title: smooth constraint functions
author: Ben Bolker
bibliography: posfun.bib
csl: apa.csl
output:
html_document:
toc: true
---
```{r pkgs, include = FALSE}
library(numDeriv)
library(emdbook) ## lambertW
library(TMB) ## testing CppAD stuff
```
# Polynomial/piecewise smooths
Often when we're fitting models, we have functions that go somewhere they're not supposed to (e.g. densities that fall below 0 or probabilities that exceed the range 0<x<1). If we can parameterize everything so that the constraints apply to a single parameter, we can use *box constraints* or we can reparameterize the model in terms of an unconstrained parameter (e.g. fit on the scale of $\log(\beta)$ for a value that needs to be positive, or on the scale of $\textrm{logit}(\beta)$ for a value that needs to be between 0 and 1. Sometimes, however, it's not easy to see how to reparameterize the model/constrain a single parameter. A useful strategy in this case is to force the value *smoothly* to its boundary, and at the same time add a quadratic penalty that penalizes the objective function for failing to stay inside the bounds.
See [TMB issue](https://github.com/kaskr/adcomp/issues/7) for more background.
AD Model Builder [@fournier_ad_2011], a powerful tool especially used in fisheries modeling, defined `posfun` as follows:
$$
f(x) = \begin{cases}
\frac{\varepsilon}{2-x/\varepsilon},& \text{if } x< \varepsilon\\
x & \text{otherwise}
\end{cases}
$$
(`posfun` also had the side effect of adding $\gamma (x-\varepsilon)^2$ to a running penalty term if $x<\varepsilon$; the accumulated penalty could be added to the negative log-likelihood.)
This function has been widely used in applied fisheries research
[@breen_effects_2003; @branch_general_2010; @carruthers_spatial_2011; @rudd_does_2017]. It has the following useful properties: for a given $\varepsilon>0$,
- $f(x) = x$ for $x>\varepsilon$
- $f(x) >0$ for all $x$
- $f'(x) >0$ for all $x$
- $f(x) = \varepsilon/2$ for $x=0$
However, it only has a continuous *first* derivative at $x=\varepsilon$. This causes problems if we are trying to do any numerical operations that depend on a continuous second derivative, e.g. Laplace approximation (or Riemannian Hamiltonian Monte Carlo \ldots [@girolami_riemannian_2019]).
Thus we need a function $f(x)$ that also satisfies
- $f(x)$, $f'(x)$, and $f''(x)$ are everywhere continuous ;
this implies $f(\varepsilon=\varepsilon)$; $f'(\varepsilon)=1$; $f''(\varepsilon)=0$. (We are willing to give up the last property above ($f(0)=\varepsilon/2$).
Start by setting $x'=(\varepsilon-x)$. Then $x'>0$ for $x<\varepsilon$ and $x'=0$ when $x=\varepsilon$. Suppose we take $g(x') = \left( 1+a x' + bx'^2 \right)$ and $f(x') = \varepsilon g(x')^{-1}$. $g(x'=0)=1$, so $f(x'=0)=\varepsilon$. Now
$$
f'(x'=0) = -\varepsilon g'(0) (g(0))^{-2} = -\varepsilon a
$$
$$
\begin{split}
f''(x'=0) & = -\varepsilon \left(g''(0) (g(0))^{-2} + g'(0) \cdot -2 g'(0) (g(0))^{-3}\right) \\
& = -\varepsilon \left( \frac{g''(0)-2(g'(0))^2 g(0)}{g(0)^2} \right) \\
& = -\varepsilon \left(2b -2a^2 \right)
\end{split}
$$
So we need $a=-1/\varepsilon$, $b=1/\varepsilon^2$ ?
Let's test it:
```{r test1}
f <- function(x,eps=0.001) {
eps*(1/(1-(x-eps)/eps + (x-eps)^2/eps^2))
}
f(0.001)
library(numDeriv)
grad(f,0.001)
## not exactly zero but close enough ...
all.equal(drop(hessian(f,0.001)),0)
```
Can we figure out what the general form would be to make all higher derivatives zero? Does this Taylor series converge to something easily recognizable ... ?
```{r pic1}
xvec <- seq(-0.002,0.002,length=601)
dfun <- function(f, xvec, n, eps=0.001, poly = TRUE) {
e <- body(f)[[2]]
for (i in seq_len(n)) {
e <- D(e, name="x")
}
lval <- eval(e, list(x=xvec))
if (poly) {
gval <- switch(as.character(n), "0"=xvec, "1"=1, 0)
r <- ifelse(xvec<eps,lval,gval)
} else {
gval <- switch(as.character(n), "0"=eps, 0)
r <- ifelse(xvec<0, gval, lval)
}
return(r)
}
mkderivs <- function(f, maxd=3, poly = TRUE) {
return(purrr::map_dfr(setNames(0:maxd,0:maxd),
## too hard to make (...) work here ...
~ tibble::tibble(x=xvec, y=dfun(f, xvec, n=., poly = poly)),
.id="deriv"))
}
mkcomp <- function(maxd=3) {
return(purrr::map_dfr(setNames(0:maxd,0:maxd),
~tibble::tibble(x=xvec,
y=dplyr::case_when(.==0 ~ xvec,
.==1 ~ 1,
TRUE ~ 0)),
.id="deriv"))
}
library(ggplot2); theme_set(theme_bw())
plotfun <- function(f, maxd=2, xvec=xvec, poly = TRUE, ...) {
dd <- mkderivs(f, maxd, poly = poly, ...)
cc <- mkcomp(maxd)
return(ggplot(dd, aes(x,y))
+ geom_line()
+ geom_line(data=cc, linetype=2)
+ facet_wrap(~deriv, scale="free",
labeller=label_both)
)
}
print(plotfun(f))
```
What if we went one more step (i.e. make $g(x') = \left( 1+a x' + bx'^2 + cx'^3 \right)$?)
Tried to do the algebra myself but Wolfram Alpha does it better: `Solve[D[D[D[eps/(1-x/eps+x^2/eps^2 + c x^3),x],x],x]==0, {c}]`
```{r d3}
f <- function(x,eps=0.001) {
eps*(1/(1-(x-eps)/eps + (x-eps)^2/eps^2 - (x-eps)^3/eps^3))
}
print(plotfun(f,3))
```
By induction/guessing, we have
$$
f(x') = \varepsilon \left(\sum_{i=0}^n (-1)^i y^i \right)^{-1}
$$
where $y = 1-x'/\varepsilon$.
(If the reciprocal of the sum converges to $1+y$ then we have $f(x') = \varepsilon (1+y) = \varepsilon (2-x'/\varepsilon)$ ... ?)
```{r d5}
f <- function(x,eps=0.001) {
eps*(1/(1-(x-eps)/eps + (x-eps)^2/eps^2 - (x-eps)^3/eps^3
+ (x-eps)^4/eps^4 - (x-eps)^5/eps^5))
}
print(plotfun(f,5))
```
Note that while the derivatives are indeed continuous, the magnitudes increase dramatically as we go to higher orders; this could conceivably cause problems for very sensitive problems ... ?
A simpler representation:
```{r f2}
f2 <- function(x,eps=0.001) {
xp <- 1-x/eps
eps*(1/(1+xp+xp^2 +xp^3+xp^4+xp^5))
}
all.equal(f(xvec),f2(xvec))
```
---
Anders Nielsen's implementation:
```{r impl2}
library(numDeriv)
f_an <- Vectorize(
function(x, eps=0.001, n=2){
if(x<eps){
eps / sum( (-1)^(0:n)*((x-eps)^(0:n))/(eps^(0:n)))
}else{
x
}
}
)
Df <- function(x) grad(f_an,x)
DDf <- function(x) grad(Df,x)
par(mfrow=c(1,3))
from <- 0.001-0.001
to <- 0.001+0.001
plot(f_an, from, to, main=0)
plot(Df, from, to, main=1)
plot(DDf, from, to, main=2)
```
## Testing
This is [Jim Thorson's example](https://github.com/bbolker/bbmisc/blob/master/posfun_ex.cpp): at least it doesn't break ...
```{r test_tmb}
library(TMB)
compile("posfun_ex.cpp")
dyn.load(dynlib("posfun_ex"))
n <- 1000; p <- .1
set.seed(2)
x <- rbinom(n, size=1, prob=p)
mod <- MakeADFun(data=list(x=x, eps=1e-3), parameters=list(p=0,Dummy=0), random="p", silent = TRUE)
opt <- nlminb(mod$par, mod$fn, mod$gr)
environment(mod$fn)$last.par.best
```
# Exponential smooths
Jonathan Dushoff suggests using something like
$$
f(x) = \begin{cases}
\varepsilon & \textrm{ if } x < 0 \\
x + \varepsilon \exp\left( - \frac{x}{\varepsilon} \right) & \text{otherwise}
\end{cases}
$$
(JD doesn't necessarily take responsibility for the case statement/behaviour when $x < 0$).
```{r f_exp, fig.height = 4, fig.width = 8}
xvec <- seq(-0.02,0.02,length=601)
f_exp <- function(x, eps = 0.001) { x + eps*exp(-x/eps) }
pp <- plotfun(f_exp, poly = FALSE)
print(pp)
```
Unlike the polynomial approaches above, this function does not have the property that $f(x) = x$ for $x > \varepsilon$; in fact, mathematically $f(x)$ is always greater than $x$. However, due to floating-point underflow, $f(x) = x$ in double-precision arithmetic when $\left( \frac{\varepsilon}{x} \right) \exp\left( - \frac{x}{\varepsilon} \right) < \delta$, where $\delta$ is the `double.eps` value from `.Machine$double.eps` (i.e., the smallest value such that $1+\delta \neq 1$), $\approx 2 \times 10^{-16}$.
We could do some work to solve this (recognizing that the answer will involve the Lambert $W$ function), or we could just ask `sympy`:
```{python sympy}
from sympy import *
y, d = symbols(('y', 'd'))
solve(exp(-y)/y - d, y)
```
So the correction term disappears due to floating-point underflow as soon as $x/\varepsilon$ is greater than:
```{r min val}
emdbook::lambertW(1/.Machine$double.eps)
```
# References