Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency of linear predictor values (numerical instability?) #604

Open
fweber144 opened this issue Oct 18, 2023 · 0 comments
Open

Inconsistency of linear predictor values (numerical instability?) #604

fweber144 opened this issue Oct 18, 2023 · 0 comments

Comments

@fweber144
Copy link

Summary:

The linear predictors used internally by log_lik() may differ from those derived from posterior_linpred().

Description:

While working on projpred's unit tests, I discovered an inconsistency between the linear predictors that are used internally by rstanarm:::log_lik.stanreg() and those calculated by rstanarm:::posterior_linpred.stanreg(). This might be due to some numerical instability within one of these two approaches. I stumbled across this while trying to reproduce the output of log_lik() manually (based on the linear predictors calculated by posterior_linpred()); that's why the reproducible steps shown below include the log likelihood values.

Reproducible Steps:

File rfit.rds is provided in rfit.zip.

The point is that line all.equal(eta_from_linpred, eta_from_loglik_internals) shows differing linear predictors. The fact that the log likelihood values differ (line all.equal(ll_from_linpred, unname(ll))) is then clear because it's already the linear predictors which differ.

library(rstanarm)
rfit <- readRDS("rfit.rds")

# Via log_lik():
ll <- log_lik(rfit)
# Inside of rstanarm:::log_lik.stanreg(), the following takes place:
args <- rstanarm:::ll_args.stanreg(rfit, newdata = NULL, offset = NULL,
                                   reloo_or_kfold = FALSE)
fun <- function(data_i, draws) {
  val <- dbinom(data_i$y, size = data_i$trials,
                prob = rstanarm:::.mu(data_i, draws), log = TRUE)
  rstanarm:::.weighted(val, data_i$weights)
}
out <- vapply(seq_len(args$N), FUN = function(i) {
  as.vector(fun(data_i = args$data[i, , drop = FALSE],
                draws = args$draws))
}, FUN.VALUE = numeric(length = args$S))
ll_from_loglik_internals <- out
stopifnot(identical(ll_from_loglik_internals, unname(ll)))
# Inside of rstanarm:::.mu(), the following takes place:
eta_from_loglik_internals <- do.call(cbind, lapply(
  seq_len(args$N),
  function(i) {
    data_i <- args$data[i, , drop = FALSE]
    rstanarm:::linear_predictor.matrix(
      args$draws$beta,
      rstanarm:::.xdata(data_i),
      data_i$offset
    )
  }
))
names(dimnames(eta_from_loglik_internals)) <- c("iterations", "")

# Via posterior_linpred():
eta_from_linpred <- posterior_linpred(
  rfit, newdata = NULL, offset = rep(0, nrow(rfit$data))
)
all.equal(eta_from_linpred, eta_from_loglik_internals)
## --> [1] "Mean relative difference: 0.2869042"
mu_from_linpred <- binomial()$linkinv(eta_from_linpred)
ll_from_linpred <- t(array(dbinom(rfit$data$y_gamm_brnll, size = 1,
                                  prob = t(mu_from_linpred), log = TRUE),
                           dim = rev(dim(eta_from_linpred))))
all.equal(ll_from_linpred, unname(ll))
## --> [1] "Mean relative difference: 0.2186712"

RStanARM Version:

2.26.1

R Version:

R version 4.3.1 (2023-06-16)

Operating System:

Ubuntu 22.04.3 LTS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant