posfun.Rmd

---
title: smooth constraint functions
author: Ben Bolker
bibliography: posfun.bib
csl: apa.csl
output:
   html_document:
     toc: true
---

```{r pkgs, include = FALSE}
library(numDeriv)
library(emdbook) ## lambertW
library(TMB)     ## testing CppAD stuff
```

# Polynomial/piecewise smooths

Often when we're fitting models, we have functions that go somewhere they're not supposed to (e.g. densities that fall below 0 or probabilities that exceed the range 0<x<1). If we can parameterize everything so that the constraints apply to a single parameter, we can use *box constraints* or we can reparameterize the model in terms of an unconstrained parameter (e.g. fit on the scale of $\log(\beta)$ for a value that needs to be positive, or on the scale of $\textrm{logit}(\beta)$ for a value that needs to be between 0 and 1. Sometimes, however, it's not easy to see how to reparameterize the model/constrain a single parameter. A useful strategy in this case is to force the value *smoothly* to its boundary, and at the same time add a quadratic penalty that penalizes the objective function for failing to stay inside the bounds.

See [TMB issue](https://github.com/kaskr/adcomp/issues/7) for more background.

AD Model Builder [@fournier_ad_2011], a powerful tool especially used in fisheries modeling, defined  `posfun` as follows:

$$
f(x) = \begin{cases}
    \frac{\varepsilon}{2-x/\varepsilon},& \text{if } x< \varepsilon\\
    x              & \text{otherwise}
\end{cases}
$$

(`posfun` also had the side effect of adding $\gamma (x-\varepsilon)^2$ to a running penalty term if $x<\varepsilon$; the accumulated penalty could be added to the negative log-likelihood.)

This function has been widely used in applied fisheries research
[@breen_effects_2003; @branch_general_2010; @carruthers_spatial_2011; @rudd_does_2017].  It has the following useful properties:  for a given $\varepsilon>0$,

- $f(x) = x$ for $x>\varepsilon$
- $f(x) >0$ for all $x$
- $f'(x) >0$ for all $x$
- $f(x) = \varepsilon/2$ for $x=0$

However, it only has a continuous *first* derivative at $x=\varepsilon$. This causes problems if we are trying to do any numerical operations that depend on a continuous second derivative, e.g. Laplace approximation (or Riemannian Hamiltonian Monte Carlo \ldots [@girolami_riemannian_2019]).

Thus we need a function $f(x)$ that also satisfies

- $f(x)$, $f'(x)$, and $f''(x)$ are everywhere continuous ;

this implies $f(\varepsilon=\varepsilon)$; $f'(\varepsilon)=1$; $f''(\varepsilon)=0$. (We are willing to give up the last property above ($f(0)=\varepsilon/2$).

Start by setting $x'=(\varepsilon-x)$. Then $x'>0$ for $x<\varepsilon$ and $x'=0$ when $x=\varepsilon$.  Suppose we take $g(x') = \left( 1+a x' + bx'^2 \right)$ and $f(x') = \varepsilon g(x')^{-1}$. $g(x'=0)=1$, so $f(x'=0)=\varepsilon$. Now


$$
f'(x'=0) = -\varepsilon g'(0) (g(0))^{-2} = -\varepsilon a
$$

$$
\begin{split}
f''(x'=0) & = -\varepsilon \left(g''(0) (g(0))^{-2} + g'(0) \cdot -2 g'(0) (g(0))^{-3}\right) \\
 & = -\varepsilon \left( \frac{g''(0)-2(g'(0))^2 g(0)}{g(0)^2} \right) \\
 & = -\varepsilon \left(2b  -2a^2 \right)
 \end{split}
$$

So we need $a=-1/\varepsilon$, $b=1/\varepsilon^2$ ?

Let's test it:

```{r test1}
f <- function(x,eps=0.001) {
    eps*(1/(1-(x-eps)/eps + (x-eps)^2/eps^2))
}
f(0.001)
library(numDeriv)
grad(f,0.001)
## not exactly zero but close enough ...
all.equal(drop(hessian(f,0.001)),0)
```

Can we figure out what the general form would be to make all higher derivatives zero?  Does this Taylor series converge to something easily recognizable ... ?

```{r pic1}
xvec <- seq(-0.002,0.002,length=601)
dfun <- function(f, xvec, n, eps=0.001, poly = TRUE) {
    e <- body(f)[[2]]
    for (i in seq_len(n)) {
        e <- D(e, name="x")
    }
    lval <- eval(e, list(x=xvec))
    if (poly) {
      gval <- switch(as.character(n), "0"=xvec, "1"=1, 0)
      r <- ifelse(xvec<eps,lval,gval)
    } else {
      gval <- switch(as.character(n), "0"=eps, 0)
      r <- ifelse(xvec<0, gval, lval)
    }
    return(r)
}

mkderivs <- function(f, maxd=3, poly = TRUE) {
  return(purrr::map_dfr(setNames(0:maxd,0:maxd),
                        ## too hard to make (...) work here ...
                        ~ tibble::tibble(x=xvec, y=dfun(f, xvec, n=., poly = poly)),
                        .id="deriv"))
}

mkcomp <- function(maxd=3) {
    return(purrr::map_dfr(setNames(0:maxd,0:maxd),
                          ~tibble::tibble(x=xvec,
                                          y=dplyr::case_when(.==0 ~ xvec,
                                                             .==1 ~ 1,
                                                             TRUE ~ 0)),
                          .id="deriv"))
}

library(ggplot2); theme_set(theme_bw())

plotfun <- function(f, maxd=2, xvec=xvec, poly = TRUE, ...) {
    dd <- mkderivs(f, maxd, poly = poly, ...)
    cc <- mkcomp(maxd)
    return(ggplot(dd, aes(x,y))
           + geom_line()
           + geom_line(data=cc, linetype=2)
           + facet_wrap(~deriv, scale="free",
                        labeller=label_both)
           )
}

print(plotfun(f))
```

What if we went one more step (i.e. make $g(x') = \left( 1+a x' + bx'^2 + cx'^3 \right)$?)

Tried to do the algebra myself but Wolfram Alpha does it better: `Solve[D[D[D[eps/(1-x/eps+x^2/eps^2 + c x^3),x],x],x]==0, {c}]`

```{r d3}
f <- function(x,eps=0.001) {
    eps*(1/(1-(x-eps)/eps + (x-eps)^2/eps^2 - (x-eps)^3/eps^3))
}
print(plotfun(f,3))
```


By induction/guessing, we have
$$
f(x')  = \varepsilon \left(\sum_{i=0}^n (-1)^i y^i \right)^{-1}
$$
where $y = 1-x'/\varepsilon$.
(If the reciprocal of the sum converges to $1+y$ then we have $f(x') = \varepsilon (1+y) = \varepsilon (2-x'/\varepsilon)$ ... ?)

```{r d5}
f <- function(x,eps=0.001) {
    eps*(1/(1-(x-eps)/eps + (x-eps)^2/eps^2 - (x-eps)^3/eps^3
        + (x-eps)^4/eps^4 - (x-eps)^5/eps^5))
}
print(plotfun(f,5))
```

Note that while the derivatives are indeed continuous, the magnitudes increase dramatically as we go to higher orders; this could conceivably cause problems for very sensitive problems ... ?


A simpler representation:
```{r f2}
f2 <- function(x,eps=0.001) {
    xp <- 1-x/eps
    eps*(1/(1+xp+xp^2 +xp^3+xp^4+xp^5))
}
all.equal(f(xvec),f2(xvec))
```

---

Anders Nielsen's implementation:

```{r impl2}
library(numDeriv)

f_an <- Vectorize(
    function(x, eps=0.001, n=2){
        if(x<eps){
            eps / sum( (-1)^(0:n)*((x-eps)^(0:n))/(eps^(0:n)))
        }else{
            x
        }
    }
)

Df <- function(x) grad(f_an,x)
DDf <- function(x) grad(Df,x)
par(mfrow=c(1,3))
from <- 0.001-0.001
to <- 0.001+0.001
plot(f_an, from, to, main=0)
plot(Df, from, to, main=1)
plot(DDf, from, to, main=2)
```


## Testing

This is [Jim Thorson's example](https://github.com/bbolker/bbmisc/blob/master/posfun_ex.cpp): at least it doesn't break ...

```{r test_tmb}
library(TMB)
compile("posfun_ex.cpp")
dyn.load(dynlib("posfun_ex"))
n <- 1000; p <- .1
set.seed(2)
x <- rbinom(n, size=1, prob=p)
mod <- MakeADFun(data=list(x=x, eps=1e-3), parameters=list(p=0,Dummy=0), random="p", silent = TRUE)
opt <- nlminb(mod$par, mod$fn, mod$gr)
environment(mod$fn)$last.par.best
```

# Exponential smooths

Jonathan Dushoff suggests using something like

$$
f(x) = \begin{cases}
	\varepsilon & \textrm{ if } x < 0 \\
	x + \varepsilon \exp\left( - \frac{x}{\varepsilon} \right) & \text{otherwise}
\end{cases}
$$

(JD doesn't necessarily take responsibility for the case statement/behaviour when $x < 0$).

```{r f_exp, fig.height = 4, fig.width = 8}
xvec <- seq(-0.02,0.02,length=601)
f_exp <- function(x, eps = 0.001) { x + eps*exp(-x/eps) }
pp <- plotfun(f_exp, poly = FALSE)
print(pp)
```

Unlike the polynomial approaches above, this function does not have the property that $f(x) = x$ for $x > \varepsilon$; in fact, mathematically $f(x)$ is always greater than $x$. However, due to floating-point underflow, $f(x) = x$ in double-precision arithmetic when $\left( \frac{\varepsilon}{x} \right) \exp\left( - \frac{x}{\varepsilon} \right) < \delta$, where $\delta$ is the `double.eps` value from `.Machine$double.eps` (i.e., the smallest value such that $1+\delta \neq 1$), $\approx 2 \times 10^{-16}$.

We could do some work to solve this (recognizing that the answer will involve the Lambert $W$ function), or we could just ask `sympy`:

```{python sympy}
from sympy import *
y, d = symbols(('y', 'd'))
solve(exp(-y)/y - d, y)
```

So the correction term disappears due to floating-point underflow as soon as $x/\varepsilon$ is greater than:

```{r min val}
emdbook::lambertW(1/.Machine$double.eps)
```

# References