lectures/bayesian_regression_and_temporal_modelling/bayesian_regression_and_temporal_modelling.qmd

---
title: "Bayesian Regression and Temporal Modelling"
subtitle: "Key ideas and concepts"
author:
 - name: "Robbie M. Parks"
   email: "robbie.parks@columbia.edu"
institute: "Environmental Health Sciences, Columbia University"
date: 2023-08-14
date-format: medium
title-slide-attributes:
  data-background-color: "#f3f4f4"
  data-background-image: "../../assets/bmeh_normal.png"
  data-background-size: 80%
  data-background-position: 60% 120%
format:
  revealjs:
    slide-number: true
    incremental: true
    chalkboard:
      buttons: false
      preview-links: auto
    logo: "../../assets/bmeh_normal.png"
    theme: [default, ../../assets/style.scss]
---

```{r}
library(here)
library(tidyverse)
library(nimble)
library(bayesplot)
library(posterior)
library(hrbrthemes)
```

# Outline

- Introduction
- Regression models
- Using `NIMBLE` for Bayesian inference

# Overview

## Finding associations from data

::: nonincremental
- Generate some points:
:::

``` R
# Load packages
library(ggplot2)

# Create a dataset
set.seed(100)
data = data.frame(x=rnorm(100),y=rnorm(100))

# Plot and rough fit
p <- ggplot(data, aes(x, y)) +
  geom_point()

plot(p)
```

## Finding associations from data

::: nonincremental
- Plot the generated points:
:::

```{r}
#| echo: false
library(ggplot2)
set.seed(100)
data <- data.frame(x = rnorm(100), y = rnorm(100))

p <- ggplot(data, aes(x, y)) +
  geom_point()

plot(p)
```

## Finding associations from data

::: nonincremental
- Establish some kind of association:
:::

```{r}
#| echo: false
library(ggplot2)
set.seed(100)
data <- data.frame(x = rnorm(100), y = rnorm(100))

p <- ggplot(data, aes(x, y)) +
  geom_point() +
  geom_smooth(method = "loess")

plot(p)
```

## Finding associations from data

::: nonincremental
- Better still, use some real data.

:::

``` R
data <- read_csv(here("data", "Spain", "data_spain.csv"))
head(data)

data_national <- data |>
  group_by(week, week_of_year) |>
  summarise(
    deaths = sum(deaths),
    population = sum(population),
    t2m = mean(t2m),
    weekly_t2m_anomaly = mean(weekly_t2m_anomaly)
  ) |>
  mutate(week = dmy(week)) |>
  arrange(week) |>
  filter(year(week) < 2020) # avoiding COVID for now

ggplot(data = data_national) +
  geom_point(aes(x = week, y = deaths))
```

## Finding associations from data

::: nonincremental
- Better still, use some real data.
:::

```{r}
data <- read_csv(here("data", "Spain", "data_spain.csv"))
head(data)

data_national <- data |>
  group_by(week, week_of_year) |>
  summarise(
    deaths = sum(deaths),
    population = sum(population),
    t2m = mean(t2m),
    weekly_t2m_anomaly = mean(weekly_t2m_anomaly)
  ) |>
  mutate(week = dmy(week)) |>
  arrange(week) |>
  filter(year(week) < 2020) # avoiding COVID for now
```

## Finding associations from data

::: nonincremental
- Better still, use some real data.
:::

```{r}
#| echo: false
ggplot(data = data_national) +
  geom_point(aes(x = week, y = deaths))
```

## Finding associations from data

::: nonincremental
- Find some kind of association.
:::

```{r}
#| echo: false
data_national <- data_national |>
  mutate(rate = 100000 * deaths / population)

code_linear <- nimbleCode({
  # priors
  alpha ~ dnorm(0, sd = 10) # prior for alpha
  beta_week ~ dnorm(0, sd = 10) # prior for beta_week

  # likelihood
  for (t in 1:Nw) {
    deaths[t] ~ dpois(mu[t])
    log(mu[t]) <- log(population[t]) + alpha + beta_week * t
  }

  # what's the estimated annual rate of change?
  beta_year <- exp(52 * beta_week)
})

constants <- list(Nw = nrow(data_national))
data <- list(deaths = data_national$deaths, population = data_national$population)

inits <- list(alpha = 0, beta_week = 0)
parameters_to_monitor <- c("alpha", "beta_week", "beta_year")

nimbleMCMC_samples_linear <- nimbleMCMC(
  code = code_linear,
  data = data,
  constants = constants,
  inits = inits,
  monitors = parameters_to_monitor,
  niter = 10000,
  nburnin = 5000,
  setSeed = 1,
  progressBar = FALSE,
  samplesAsCodaMCMC = TRUE
)

linear_fit <- data_national |>
  ungroup() |>
  mutate(
    .death_rate_fit = 100000 * exp(
      # add alpha and beta_week * week_number by sample
      sweep(
        nimbleMCMC_samples_linear[, "beta_week"] %*% t(1:nrow(data_national)),
        1,
        nimbleMCMC_samples_linear[, "alpha"],
        FUN = "+"
      )
    ) |>
      # then take the mean of the samples
      apply(
        FUN = mean,
        MARGIN = 2
      )
  ) |>
  mutate(residuals = rate - .death_rate_fit)

linear_fit |>
  ggplot() +
  geom_point(aes(x = week, y = rate), size = 0.6) +
  geom_line(aes(x = week, y = .death_rate_fit), size = 0.8, colour = "red")
```

## But do the residuals look right?

```{r}
linear_fit |>
  ggplot(aes(x = residuals)) +
  geom_histogram()
```

## Regression models

- Most commonly used to assess presence of a relationship between a dependent variable and one or more independent variables.
- Bayesian regression originally used in econometrics.
- Now wide range of fields use regression, including environmental epidemiology.
- Partly why we find ourselves here today!

## Regression models

- Basic steps for regression models:
1. Establish suitable model for observations.
2. Identify type of relationship between predictor(s) and outcome.
3. For Bayesian approach, specify prior distributions etc.
4. Run the model somehow (e.g., `R` with `NIMBLE`).
5. See how well the model has fit.

# Suitable models for observations

## Generalized linear models

- Extension of linear regression to when distribution not necessarily normally distributed.
- Distributions which are part of the exponential family can describe all sort of non-normally distributed variables.
- Many examples commonly encountered in environmental health.
- We'll go through a few now...

## Exponential family of models

- Sometimes dependent variable is not normally distributed.
- Exponential family is family of models described in a certain way (won't get into it in this workshop).
- All of the models we'll look at are exponential family models.

## Some types of models to know about

- Normal (we've seen last lecture)
- Logistic regression (yes/no)
- Binomial (set number of trials)
- Poisson (counts)

## Normal distribution

::: nonincremental
- As seen in first lecture.
:::

$$
\begin{split}
y_i &\sim \text{Normal}(\mu_i, \sigma) \quad i = 1,..., N \\
\mu_i &= \alpha + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3
\end{split}
$$

## Logistic regression

::: nonincremental
- Used when we want to classify observations into two groups (zero or one).
- If dependent variable represents sets of trails of yes/no:
:::

$$
\begin{split}
y_i &\sim \text{Bernoulli}(p_i) \quad i = 1,..., N \\
\text{logit}(p_i) &= \alpha + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3
\end{split}
$$

## Binomial regression

::: nonincremental
- Used when we want to know how many successes from a set number of trials
- Distributed with Binomial distribution (zero to $n$ successes).
:::

$$
\begin{split}
y_i &\sim \text{Bin}(n,p_i) \quad i = 1,..., N \\
\text{logit}(p_i) &= \alpha + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3
\end{split}
$$

## Poisson regression

::: nonincremental
- Used typically with count data (number of events happening within a discrete space and time).
:::

$$
\begin{split}
y_t &\sim \text{Pois}(\mu_t) \quad i = 1,..., T \\
\log(\mu_t) &= \log(P_t) + \alpha + \beta_w t
\end{split}
$$

## Link function

- When modelling, we must make a decision about how the predictors are related to the key parameters in our chosen relationship
- You will see several of the remaining time in the workshop.
- For example, link function on Poisson regression is usually a log-link function to prevent counts from going negative.

$$
\begin{split}
\log(\mu_t) &= \log(P_t) + \alpha + \beta_w t
\end{split}
$$

## Reminder of regression model steps

::: nonincremental
- Basic steps for regression models:
1. Establish suitable model for observations.
2. Identify type of relationship between predictor(s) and outcome.
3. For Bayesian approach, specify prior distributions etc.
4. Run the model somehow (e.g., `R` with `NIMBLE`).
5. See how well the model has fit.
:::

# Relationship between predictor(s) and outcome

## Linear regression

- Simplest regression model.
- Assuming a linear relationship between predictors and the outcome.
- Used in countless different applications.

## Linear regression

$$
\begin{split}
y_i &\sim \text{Normal}(\mu_i, \sigma) \quad i = 1,..., N \\
\mu_i &= \alpha + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3
\end{split}
$$

## Linear regression

::: nonincremental
- As a reminder of last lab, in `NIMBLE`, we can write (including priors):
:::

``` R
code <- nimbleCode({
  # priors for parameters
  alpha ~ dnorm(0, sd = 100) # prior for alpha
  beta1 ~ dnorm(0, sd = 100) # prior for beta1
  beta2 ~ dnorm(0, sd = 100) # prior for beta2
  beta3 ~ dnorm(0, sd = 100) # prior for beta3
  sigma ~ dunif(0, 100) # prior for variance components

  # regression formula
  for (i in 1:n) { # n is the number of observations we have in the data
    mu[i] <- alpha + beta1 * x1[i] + beta2 * x2[i] + beta3 * x3[i] # manual entry of linear predictors
    y[i] ~ dnorm(mu[i], sd = sigma)
  }
})
```


## Non-linear regression

- Seasonality can exist in data, for example, that requires accounting for.
- Also, there can be autocorrelation between neighbouring time points.

## Non-linear regression

```{r}
#| echo: false
ggplot(data = data_national) +
  geom_point(aes(x = week, y = deaths))
```

## Non-linear regression

- Random walk.
- Can be over time or over other units.
- Extension is autoregressive structure for longer-term memory.

## Non-linear regression

$$
\begin{split}
y_t &\sim \text{Pois}(\mu_t) \quad i = t,..., T \\
\log(\mu_t) &= \log(P_t) + \alpha + \gamma_t \\
\gamma_t &\sim N(\gamma_{t-1}, \sigma_{rw})
\end{split}
$$

## Non-linear regression

``` R
code_weekly_random_walk <- nimbleCode({
  # priors
  alpha ~ dnorm(0, sd = 10) # prior for alpha
  sigma_rw ~ T(dnorm(0, 1), 0, Inf) # half-normal prior for variance of weekly effects

  # likelihood
  for (t in 1:Nw) {
    deaths[t] ~ dpois(mu[t])
    log(mu[t]) <- log(population[t]) + lograte[t]
    lograte[t] <- alpha + rw[t]
  }

  # random walk over time
  rw[1] <- 0
  for (t in 2:Nw) {
    rw[t] ~ dnorm(rw[t - 1], sigma_rw)
  }
})
```

## Non-linear regression

```{r}
#| echo: false
code_weekly_random_walk <- nimbleCode({
  # priors
  alpha ~ dnorm(0, sd = 10) # prior for alpha
  sigma_rw ~ T(dnorm(0, 1), 0, Inf) # half-normal prior for variance of weekly effects

  # likelihood
  for (t in 1:Nw) {
    deaths[t] ~ dpois(mu[t])
    log(mu[t]) <- log(population[t]) + lograte[t]
    lograte[t] <- alpha + rw[t]
  }

  # random walk over time
  rw[1] <- 0
  for (t in 2:Nw) {
    rw[t] ~ dnorm(rw[t - 1], sigma_rw)
  }
})

constants <- list(Nw = nrow(data_national))
data <- list(deaths = data_national$deaths, population = data_national$population)

inits <- list(alpha = -8.0, rw = rep(0, times = nrow(data_national)), sigma_rw = 1)
parameters_to_monitor <- c("alpha", "rw", "lograte")

nimbleMCMC_samples_week_random_walk <- nimbleMCMC(
  code = code_weekly_random_walk,
  data = data,
  constants = constants,
  inits = inits,
  monitors = parameters_to_monitor,
  niter = 10000, # 80000,
  nburnin = 5000, # 40000,
  setSeed = 1,
  progressBar = FALSE,
  samplesAsCodaMCMC = TRUE
)

pred_death_rate <- 100000 * exp(
  nimbleMCMC_samples_week_random_walk[, str_c("lograte[", seq(nrow(data_national)), "]")]
) |>
  apply(
    FUN = quantile,
    MARGIN = 2,
    p = c(0.025, 0.5, 0.975)
  )

rw_fit <- data_national |>
  ungroup() |>
  mutate(
    .death_rate_median = pred_death_rate[2, ],
    .death_rate_lower = pred_death_rate[1, ],
    .death_rate_upper = pred_death_rate[3, ],
  ) |>
  mutate(residuals = rate - .death_rate_median)

rw_fit |>
  ggplot(aes(x = week)) +
  geom_point(aes(y = rate), size = 2) +
  geom_ribbon(aes(ymin = .death_rate_lower, ymax = .death_rate_upper), fill = "red", alpha = 0.1) +
  geom_line(aes(y = .death_rate_median), size = 0.4, colour = "red")
```

## Reminder of regression model steps

::: nonincremental
- Basic steps for regression models:
1. Establish suitable model for observations.
2. Identify type of relationship between predictor(s) and outcome.
3. For Bayesian approach, specify prior distributions etc.
4. Run the model somehow (e.g., `R` with `NIMBLE`).
5. See how well the model has fit.
:::

# Prior distributions

## Setting priors

- Need to do this on:
- The regression parameters (e.g., the $\beta$ parameters).
- Variance of outcome (e.g., $\sigma^2$).
- In absence of information, set priors as vague:

``` R
alpha ~ dnorm(0, sd = 10) # prior for alpha
beta_temperature ~ dnorm(0, sd = 10) # prior for beta_temperature
sigma_rw ~ T(dnorm(0, 1), 0, Inf) # half-normal prior for variance of weekly effects
```

## Priors focus in `NIMBLE` model

```{r}
#| echo: false
code_weekly_random_walk <- nimbleCode({
  # priors
  alpha ~ dnorm(0, sd = 10) # prior for alpha
  sigma_rw ~ T(dnorm(0, 1), 0, Inf) # half-normal prior for variance of weekly effects

  # likelihood
  for (t in 1:Nw) {
    deaths[t] ~ dpois(mu[t])
    log(mu[t]) <- log(population[t]) + lograte[t]
    lograte[t] <- alpha + rw[t]
  }

  # random walk over time
  rw[1] <- 0
  for (t in 2:Nw) {
    rw[t] ~ dnorm(rw[t - 1], sigma_rw)
  }
})
```

## Priors focus in `NIMBLE` model

::: nonincremental
- How does $\alpha$ prior look?
:::

$$Normal(0,10)$$

```{r}
#| echo: false
p <- seq(-10, 10, length = 10000)

# create plot of Beta distribution with shape parameters 2 and 10
plot(p, dnorm(p, 0, 10), type = "l")
```

## Priors focus in `NIMBLE` model

::: nonincremental
- Alternative $\alpha$ priors:
:::

$$Normal(0,1)$$

```{r}
#| echo: false
p <- seq(-10, 10, length = 10000)

# create plot of Beta distribution with shape parameters 2 and 10
plot(p, dnorm(p, 0, 1), type = "l")
```

## Priors focus in `NIMBLE` model

::: nonincremental
- Alternative $\alpha$ priors:
:::

$$Normal(1,1)$$

```{r}
#| echo: false
p <- seq(-10, 10, length = 10000)

# create plot of Beta distribution with shape parameters 2 and 10
plot(p, dnorm(p, 1, 1), type = "l")
```

## Priors focus in `NIMBLE` model

::: nonincremental
- Alternative $\alpha$ priors:
:::

$$Normal(1,5)$$

```{r}
#| echo: false
p <- seq(-10, 10, length = 10000)

# create plot of Beta distribution with shape parameters 2 and 10
plot(p, dnorm(p, 1, 5), type = "l")
```

## Reminder of regression model steps

::: nonincremental
- Basic steps for regression models:
1. Establish suitable model for observations.
2. Identify type of relationship between predictor(s) and outcome.
3. For Bayesian approach, specify prior distributions etc.
4. Run the model somehow (e.g., `R` with `NIMBLE`).
5. See how well the model has fit.
:::

# Running models

- Use `NIMBLE` with `R`!

## Linear regression: Real world example

::: nonincremental
- Let's load in the data we'll use for lab (Spain mortality).
:::

``` R
data <- read_csv(here("data", "Spain", "data_spain.csv"))
```

```{r}
data <- read_csv(here("data", "Spain", "data_spain.csv"))
```

## Linear regression: Real world example

::: nonincremental
- What does it look like?

:::

``` R
data <- read_csv(here("data", "Spain", "data_spain.csv"))
head(data)

data_national <- data |>
  group_by(week, week_of_year) |>
  summarise(
    deaths = sum(deaths),
    population = sum(population),
    t2m = mean(t2m),
    weekly_t2m_anomaly = mean(weekly_t2m_anomaly)
  ) |>
  mutate(week = dmy(week)) |>
  arrange(week) |>
  filter(year(week) < 2020) # avoiding COVID for now

ggplot(data = data_national) +
  geom_point(aes(x = week, y = deaths))
```

## Linear regression: Real world example

::: nonincremental
- Find some kind of association.
:::

``` R
code_linear <- nimbleCode({
  # priors
  alpha ~ dnorm(0, sd = 10) # prior for alpha
  beta_week ~ dnorm(0, sd = 10) # prior for beta_week

  # likelihood
  for (t in 1:Nw) {
    deaths[t] ~ dpois(mu[t])
    log(mu[t]) <- log(population[t]) + alpha + beta_week * t
  }

  # what's the estimated annual rate of change?
  beta_year <- exp(52 * beta_week)
})
```

## Linear regression: Real world example

::: nonincremental
- Find some kind of association.
:::

```{r}
#| echo: false
data_national <- data_national |>
  mutate(rate = 100000 * deaths / population)

code_linear <- nimbleCode({
  # priors
  alpha ~ dnorm(0, sd = 10) # prior for alpha
  beta_week ~ dnorm(0, sd = 10) # prior for beta_week

  # likelihood
  for (t in 1:Nw) {
    deaths[t] ~ dpois(mu[t])
    log(mu[t]) <- log(population[t]) + alpha + beta_week * t
  }

  # what's the estimated annual rate of change?
  beta_year <- exp(52 * beta_week)
})

constants <- list(Nw = nrow(data_national))
data <- list(deaths = data_national$deaths, population = data_national$population)

inits <- list(alpha = 0, beta_week = 0)
parameters_to_monitor <- c("alpha", "beta_week", "beta_year")

nimbleMCMC_samples_linear <- nimbleMCMC(
  code = code_linear,
  data = data,
  constants = constants,
  inits = inits,
  monitors = parameters_to_monitor,
  niter = 10000,
  nburnin = 5000,
  setSeed = 1,
  progressBar = FALSE,
  samplesAsCodaMCMC = TRUE
)

pred_death_rate <- 100000 * exp(
  # add alpha and beta_week * week_number by sample
  sweep(
    nimbleMCMC_samples_linear[, "beta_week"] %*% t(1:nrow(data_national)),
    1,
    nimbleMCMC_samples_linear[, "alpha"],
    FUN = "+"
  )
) |>
  # take the median and 2.5, 97.5 quantiles
  apply(
    FUN = quantile,
    MARGIN = 2,
    p = c(0.025, 0.5, 0.975)
  )

linear_fit <- data_national |>
  ungroup() |>
  mutate(
    .death_rate_median = pred_death_rate[2, ],
    .death_rate_lower = pred_death_rate[1, ],
    .death_rate_upper = pred_death_rate[3, ],
  ) |>
  mutate(residuals = rate - .death_rate_median)

linear_fit |>
  ggplot(aes(x = week)) +
  geom_point(aes(y = rate), size = 0.6) +
  geom_ribbon(aes(ymin = .death_rate_lower, ymax = .death_rate_upper), fill = "red", alpha = 0.1) +
  geom_line(aes(y = .death_rate_median), size = 0.4, colour = "red")
```
## Reminder of regression model steps

::: nonincremental
- Basic steps for regression models:
1. Establish suitable model for observations.
2. Identify type of relationship between predictor(s) and outcome.
3. For Bayesian approach, specify prior distributions etc.
4. Run the model somehow (e.g., `R` with `NIMBLE`).
5. See how well the model has fit.
:::


# Evaluating model fit

## Evaluating model fit

- Sometimes a seemingly reasonable approach can result in unexpected results.
- When this happens (and in general), good idea to have some way to evaluate model fit of candidate models.
- Ideally residuals after model fit are essentially normally distributed centered around zero.

## Linear model residuals

```{r}
linear_fit |>
  ggplot(aes(x = residuals)) +
  geom_histogram()
```

## Random walk residuals

```{r}
rw_fit |>
  ggplot(aes(x = residuals)) +
  geom_histogram()
```

## Other ways of evaluating model fit

- Sometimes not just how well model fits data.
- Also model parameterisation important.
- Predictability is very important and overfitted models are not going to predict well.
- Others will go into Bayesian versions of this.

## Reminder of regression model steps

::: nonincremental
- Basic steps for regression models:
1. Establish suitable model for observations.
2. Identify type of relationship between predictor(s) and outcome.
3. For Bayesian approach, specify prior distributions etc.
4. Run the model somehow (e.g., `R` with `NIMBLE`).
5. See how well the model has fit.
:::

# Getting ready for the lab

## The lab for this session {.smaller}

- This goal of this lab is to explore some key temporal modelling concepts, including linear slopes, random walks and inclusion of linear exposure terms.

- During this lab session, we will:
1. Explore some real time series mortality data;
2. Apply a basic linear model;
3. Apply a non-linear model;
4. Incorporate basic temperature term into model;
5. Modify temperature term to be month-specific; and
6. Explore how well model convergence and fit performs.

# Questions?