Performance assessment of survival prediction models - simplified code

Goals
- Set up - load packages and import data
- Data and recoding
Goal 1 - Develop a risk prediction model with a time to event outcome
- 1.1 Model development - fit the risk prediction models
Goal 2 - Assessing performance in survival prediction models
Goal 3 - Clinical utility
Additional notes
Reproducibility ticket

Goals

In this document, we assume that individual data of the development and validation set are both available. This file illustrates in a simplified way how to develop a survival prediction model and how to assess the corresponding prediction performance.

The goals are:

To develop a risk prediction model with a time-to-event outcome;
To assess the prediction performance of a model with a time-to-event outcome;
To assess the potential clinical utility of a risk prediction model with time-to-event outcome;

When a risk prediction model has been developed and published in the literature, individual data that was used during model development are not always available. In this document, we assume the scenario that a risk prediction model was already developed and is available in the literature. We assume that the author(s) developed a risk prediction model using a Cox proportional hazard regression and provided the model equation in terms of coefficients and the baseline survival at a fixed time horizon t (e.g. five years).

Set up - load packages and import data

Please run the following code to set up the data used in the following document. The following libraries are needed to achieve the following goals, if you have not them installed, please use install.packages(’‘) (e.g. install.packages(’survival’)) or use the user-friendly approach if you are using RStudio.

# Use pacman to check whether packages are installed, if not load
if (!require("pacman")) install.packages("pacman")
library(pacman)

pacman::p_load(
  survival,
  rms,
  riskRegression,
  timeROC
)

options(show.signif.stars = FALSE)  # display statistical intelligence
palette("Okabe-Ito")  # color-blind friendly  (needs R 4.0)

Data and recoding

Outcome and predictors in the new data must be coded as provided in manuscript.

We load the development (rotterdam) and the validation data (gbsg) from survival package. The Rotterdam breast cancer data was used to predict the risk of recurrence or death using size, stage and tumor size as predictors. These three predictors were used in the Nottingham Prognostic Index, one of the most popular indexes to determine prognosis following surgery of breast cancer.
The Germany Breast Cancer Study Group data was used as an external validation of the model developed in the Rotterdam breast cancer data. The prediction model will be then extended using the progesterone (PGR) marker measured at primary surgery.

In the prediction model developed using the Rotterdam data, the data was administratively censored at 5 years. For this reason, we also administratively censor the data from patients in the validation data (GBSG) at 5 years.

Click to expand code

# Data and recoding ----------------------------------
# Development data

rotterdam$ryear <- rotterdam$rtime/365.25  # time in years
rotterdam$rfs <- with(rotterdam, pmax(recur, death)) #The variable rfs is a status indicator, 0 = alive without relapse, 1 = death or relapse.

# Fix the outcome for 43 patients who have died but 
# censored at time of recurrence which was less than death time. 
# The actual death time should be used rather than the earlier censored recurrence time.

rotterdam$ryear[rotterdam$rfs == 1 & 
                  rotterdam$recur == 0 & 
                  rotterdam$death == 1 & 
                  (rotterdam$rtime < rotterdam$dtime)] <- 
  
  rotterdam$dtime[rotterdam$rfs == 1 &
                    rotterdam$recur == 0 & 
                    rotterdam$death == 1 & 
                    (rotterdam$rtime < rotterdam$dtime)]/365.25  

# variables used in the analysis
pgr99 <- quantile(rotterdam$pgr, .99, type = 1) # there is a large outlier of 5000, used type=1 to get same result as in SAS
rotterdam$pgr2 <- pmin(rotterdam$pgr, pgr99) # Winsorized value
nodes99 <- quantile(rotterdam$nodes, .99, type = 1) 
rotterdam$nodes2 <- pmin(rotterdam$nodes, nodes99) # NOTE: winsorizing also continuous node?

rotterdam$csize <- rotterdam$size           # categorized size
rotterdam$cnode <- cut(rotterdam$nodes, 
                       c(-1,0, 3, 51),
                       c("0", "1-3", ">3"))   # categorized node
rotterdam$grade3 <- as.factor(rotterdam$grade)
levels(rotterdam$grade3) <- c("1-2", "3")

# Save in the data the restricted cubic spline term using Hmisc::rcspline.eval() package

# Continuous nodes variable
rcs3_nodes <- Hmisc::rcspline.eval(rotterdam$nodes2,
                                   knots = c(0, 1, 9))
attr(rcs3_nodes, "dim") <- NULL
attr(rcs3_nodes, "knots") <- NULL
rotterdam$nodes3 <- rcs3_nodes

# PGR
rcs3_pgr <- Hmisc::rcspline.eval(rotterdam$pgr2,
                                 knots = c(0, 41, 486)) # using knots of the original variable (not winsorized)
attr(rcs3_pgr, "dim") <- NULL
attr(rcs3_pgr, "knots") <- NULL
rotterdam$pgr3 <- rcs3_pgr

# Validation data
gbsg$ryear <- gbsg$rfstime/365.25
gbsg$rfs   <- gbsg$status           # the GBSG data contains RFS
gbsg$cnode <- cut(gbsg$nodes, 
                  c(-1,0, 3, 51),
                  c("0", "1-3", ">3"))   # categorized node
gbsg$csize <- cut(gbsg$size,  
                  c(-1, 20, 50, 500), #categorized size
                  c("<=20", "20-50", ">50"))
gbsg$pgr2 <- pmin(gbsg$pgr, pgr99) # Winsorized value of PGR
gbsg$nodes2 <- pmin(gbsg$nodes, nodes99) # Winsorized value of continuous nodes
gbsg$grade3 <- as.factor(gbsg$grade)
levels(gbsg$grade3) <- c("1-2", "1-2", "3")

# Restricted cubic spline 
# Continuous nodes
rcs3_nodes <- Hmisc::rcspline.eval(gbsg$nodes2, knots = c(0, 1, 9))
attr(rcs3_nodes, "dim") <- NULL
attr(rcs3_nodes, "knots") <- NULL
gbsg$nodes3 <- rcs3_nodes

# PGR
rcs3_pgr <- Hmisc::rcspline.eval(gbsg$pgr2, knots = c(0, 41, 486))
attr(rcs3_pgr, "dim") <- NULL
attr(rcs3_pgr, "knots") <- NULL
gbsg$pgr3 <- rcs3_pgr


# Much of the analysis will focus on the first 5 years: create
#  data sets that are censored at 5
temp <- survSplit(Surv(ryear, rfs) ~ ., data = rotterdam, cut = 5,
                  episode="epoch")
rott5 <- subset(temp, epoch == 1)  # only the first 5 years
temp <- survSplit(Surv(ryear, rfs) ~ ., data = gbsg, cut = 5,
                  episode ="epoch")
gbsg5 <- subset(temp, epoch == 1)

# Relevel
rott5$cnode <- relevel(rotterdam$cnode, "0")
gbsg5$cnode <- relevel(gbsg$cnode, "0")

Goal 1 - Develop a risk prediction model with a time to event outcome

Prediction models are useful to provide the estimated probability of a specific outcome using personal information. In many studies, especially in medicine, the main outcome under assessment is the time to an event of interest defined generally as survival time. Prognostic models for survival endpoints, such as recurrence or progression of disease, need to account for drop out during follow-up. Patients who have not experienced the event of interest are censored observations. Cox regression analysis is the most popular statistical model to deal with such data in oncology and other medical research.

1.1 Model development - fit the risk prediction models

We develop the risk prediction model in the development data considering the first 5-year follow-up to minimize the violation of proportional hazard including size, node and grade. We also administratively censored the validation data at 5 years.

Click to expand code

# Libraries needed
if (!require("pacman")) install.packages("pacman")
pacman::p_load(survival,
              Hmisc,
              riskRegression)

# Fit the model without PGR
efit1 <- coxph(Surv(ryear, rfs) ~ csize + nodes2 + nodes3 + grade3,
  data = rott5, 
  x = T, 
  y = T)
efit1

# Baseline at 5 years
bh <- basehaz(efit1, centered = FALSE) # uncentered
bh$surv <- exp(-bh$hazard) # baseline survival
S0_t5 <- bh$surv[bh$time == 5] 
# NOTE: this can be used to calculate S(t = 5) = S0(t = 5)**exp(X*beta)

# The model with additional PGR marker
efit1_pgr  <- update(efit1, . ~ . + pgr2 + pgr3)

## Call:
## coxph(formula = Surv(ryear, rfs) ~ csize + nodes2 + nodes3 + 
##     grade3, data = rott5, x = T, y = T)
## 
##                coef exp(coef) se(coef)      z        p
## csize20-50  0.34184   1.40754  0.06542  5.225 1.74e-07
## csize>50    0.57355   1.77456  0.09248  6.202 5.57e-10
## nodes2      0.30397   1.35522  0.02810 10.816  < 2e-16
## nodes3     -0.81117   0.44434  0.10461 -7.754 8.90e-15
## grade33     0.36171   1.43578  0.07133  5.071 3.96e-07
## 
## Likelihood ratio test=531  on 5 df, p=< 2.2e-16
## n= 2982, number of events= 1275

The coefficients of the models indicated that higher size, higher number of positive lymph nodes and higher grade is more associate with poorer prognosis. The association of the progesterone marker and the outcome is non-linear as investigated previously.

Goal 2 - Assessing performance in survival prediction models

The performance of a risk prediction models may be evaluated through:

discrimination: it is the ability to differentiate between subjects who have the outcome by a certain time point and subjects who do not. It requires the coefficients (or the log of the hazard ratios) of the developed risk prediction model to be evaluated.
calibration: the agreement between observed and predicted probabilities. It requires the baseline (cumulative) hazard or survival.
overall performance measures: as a combination of discrimination and calibration and/or as a measure of the explained variation;

Unfortunately, only few publications report the complete baseline (cumulative) hazard or survival or even the baseline (cumulative) hazard or survival at fixed time horizon t. If we had both individual data of the development and validation, a complete assessment of discrimination and calibration would be possible. We could evaluate the prediction performance of a risk prediction model at a fixed time horizon(s) t and for the complete follow-up time. In risk prediction, physicians typically focus on one or more clinically relevant time horizons to inform subjects about their risk. For this reason, according to information available, different levels of validation assessment are possible. Here we aim to assess the prediction performance of a risk prediction model with time-to-event outcome in case all individual data are available and in case of only the model equation of a fixed time horizon (i.e. at 5 years) is provided including the baseline survival.

2.1 Discrimination measures

Discrimination is the ability to differentiate between subjects who have the outcome by a certain time point and subjects who do not. Concordance can be assessed over several different time intervals:

the entire range of the data. Two concordance measures are suggested:
- Harrell’s C quantifies the degree of concordance as the proportion of evaluable pairs where the patient with a longer survival time has better predicted survival;
- Uno’s C uses a time dependent weighting that more fully adjusts for censoring;
a 5 year window corresponding to our target assessment point. Uno’s cumulative/dynamic time-dependent Area Under the Curve (AUC) is suggested. Uno’s time-dependent AUC summarizes discrimination at specific fixed time points. At any time point of interest, t, a patient is classified as having an event if the patient experienced the event between baseline and t (5 years in our case study), and as a non-event if the patient remained event-free at t. The time-dependent AUC evaluates whether predicted probabilities were higher for cases than for non-cases.

There is some uncertainty in the literature about the original Harrell formulation versus Uno’s suggestion to re-weight the time scale by the factor $1/G^2(t)$ where $G$ is the censoring distribution. There is more detailed information in the concordance vignette found in the survival package.

For all three measures, values close to 1 indicate good discrimination ability, while values close to 0.5 indicated poor discrimination ability.

Click to expand code

# Libraries needed
if (!require("pacman")) install.packages("pacman")
library(pacman)
pacman::p_load(survival,
               Hmisc,
               riskRegression,
               timeROC)

# Fit the model without PGR
efit1 <- coxph(Surv(ryear, rfs) ~ csize + nodes2 + nodes3 + grade3,
  data = rott5, 
  x = T, 
  y = T)

# The model with additional PGR marker
efit1_pgr  <- update(efit1, . ~ . + pgr2 + pgr3)

# Add linear predictor in the validation set
gbsg5$lp <- predict(efit1, newdata = gbsg5)

### Harrell and Uno's concordance index 
# Harrell's C


## Validation data
# Harrell's C
harrell_C_gbsg5 <- concordance(Surv(ryear, rfs) ~ lp, 
                               gbsg5, 
                               reverse = TRUE)
# Uno's C
Uno_C_gbsg5 <- concordance(Surv(ryear, rfs) ~ lp, 
                           gbsg5, 
                           reverse = TRUE,
                           timewt = "n/G2")

##                              Estimate Lower .95 Upper .95
## Harrell C - Validation data      0.65      0.62      0.69
## Uno C - Validation data          0.64      0.60      0.67

Harrell C and Uno C were 0.65 and 0.64, respectively.

Click to expand code

# Time-dependent AUC (in Table 3 called Uno's TD AUC at 5 years) ###

# External validation
Uno_gbsg5 <-
  timeROC::timeROC(
    T = gbsg5$ryear, 
    delta = gbsg5$rfs,
    marker = gbsg5$lp,
    cause = 1, 
    weighting = "marginal", 
    times = 4.99,
    iid = TRUE
  )

# COMMENT: if you have a lot of data n > 2000, standard error computation may be really long. Please use bootstrap percentile to calculate confidence intervals.

##   Uno AUC Lower .95 Upper .95 
##      0.68      0.62      0.73

The time-dependent AUCs at 5 years were in the external validation was 0.68.

2.2 Calibration

Calibration is the agreement between observed outcomes and predicted probabilities. For example, in survival models, a predicted survival probability at a fixed time horizon t of 80% is considered reliable if it can be expected that 80 out of 100 will survive among patients who received a predicted survival probability of 80%. Calibration can be assessed at a fixed time point (e.g. at 5 years), and globally (considering the entire range of the data). In addition, different level of calibration assessment can be estimated according to the level of information available in the data. When individual data of development and validation set are available, full assessment of calibration is possible. Calibration at fixed time point is possible when baseline hazard at fixed time point and coefficient are available.

In the scenario we consider here, we can evaluate calibration only at fixed time point t (i.e. 5 years) since we may have baseline survival at time t (5 years) and coefficients of the model.

Mean calibration at a fixed time point can be estimated using the Observed versus Expected ratio at time t;
Weak calibration can be estimated by additionally calculating calibration slope.
Moderate calibration can estimated at a fixed time point using a flexible calibration curve, complemented with ICI, E50, E90.

More detailed explanations are available in the paper.

2.2.1 Mean calibration - observed/expected ratio for fixed time point

The mean calibration at fixed time point (e.g. at 5 years) can be estimated using the Observed versus Expected ratio. The observed is estimated using the complementary of the Kaplan-Meier curve at the fixed time point. The expected is estimated using the average predicted risk of the event at the fixed time point.

Click to expand code

# Libraries needed
if (!require("pacman")) install.packages("pacman")
library(pacman)
pacman::p_load(survival,
               Hmisc)

# Observed / Expected ratio
t_horizon <- 5

# Observed
obj <- summary(survfit(
  Surv(ryear, rfs) ~ 1, 
  data = gbsg5),
  times = t_horizon)

obs_t <- 1 - obj$surv

# Predicted risk 
gbsg5$pred <- riskRegression::predictRisk(efit1,
                                          newdata = gbsg5,
                                          times = t_horizon)
# Expected
exp_t <- mean(gbsg5$pred)

# Observed / Expected ratio
OE_t <- obs_t / exp_t

alpha <- .05
OE_summary <- c(
  "OE" = OE_t,
  "2.5 %" = OE_t * exp(-qnorm(1 - alpha / 2) * sqrt(1 / obj$n.event)),
  "97.5 %" = OE_t * exp(+qnorm(1 - alpha / 2) * sqrt(1 / obj$n.event))
)

OE_summary

##        OE     2.5 %    97.5 % 
## 1.0173902 0.9058717 1.1426372

Observed and expected ratio was 1.04.

2.2.2 Weak calibration - calibration slope for fixed time point

Click to expand code

# Fit the model without PGR
efit1 <- coxph(Surv(ryear, rfs) ~ csize + nodes2 + nodes3 + grade3,
  data = rott5, 
  x = T, 
  y = T)

# The model with additional PGR marker
efit1_pgr  <- update(efit1, . ~ . + pgr2 + pgr3)

# Add linear predictor in the validation set
gbsg5$lp <- predict(efit1, newdata = gbsg5)

gval <- coxph(Surv(ryear, rfs) ~ lp, data = gbsg5)

calslope_summary <- c(
  "calibration slope" = gval$coef,
  "2.5 %"  = gval$coef - qnorm(1 - alpha / 2) * sqrt(gval$var),
  "97.5 %" = gval$coef + qnorm(1 - alpha / 2) * sqrt(gval$var)
)

calslope_summary

## calibration slope.lp                2.5 %               97.5 % 
##            1.0562040            0.8158872            1.2965207

2.2.3 Moderate calibration - fixed time point

Moderate calibration at fixed time point can be assessed using flexible calibration curve, complemented with ICI, E50, E90 as suggested by Austin et al.

Calibration curve is a graphical representation of moderate calibration. It shows:
- on the x-axis the predicted survival (or risk) probabilities at a fixed time horizon (e.g. at 5 years);
- on the y-axis the observed survival (or risk) probabilities at a fixed time horizon (e.g. at 5 years);
- The 45-degree line indicates perfect calibration. Points below the 45-degree line indicate that the model overestimates the observed risk. If points are above the 45-degree line, the model underestimate the observed risk; The observed probabilities estimated by the Kaplan-Meier curves (in case of survival) or by the complementary of the Kaplan-Meier curves (in case of risk) are represented in terms of percentiles of the predicted survival (risk) probabilities.
Integrated Calibration Index (ICI) is the weighted mean of absolute difference between smoothed observed proportions and predicted probabilities in which observations are weighted by the empirical density function of the predicted probabilities;
E50 and E90 denote the median and the 90th percentile of the absolute differences between observed and predicted probabilities of the outcome at time t;

Click to expand code

if (!require("pacman")) install.packages("pacman")
library(pacman)
pacman::p_load(survival,
               Hmisc,
               rms)

# Fit the model without PGR
efit1 <- coxph(Surv(ryear, rfs) ~ csize + nodes2 + nodes3 + grade3,
  data = rott5, 
  x = T, 
  y = T)

# The model with additional PGR marker
efit1_pgr  <- update(efit1, . ~ . + pgr2 + pgr3)

gbsg5$pred <- riskRegression::predictRisk(efit1, 
                         newdata = gbsg5, 
                         times = 5)

gbsg5$pred.cll <- log(-log(1 - gbsg5$pred))


# Estimate actual risk
vcal <- rms::cph(Surv(ryear, rfs) ~ rcs(pred.cll, 3),
                 x = T,
                 y = T,
                 surv = T,
                 data = gbsg5
) 

dat_cal <- cbind.data.frame(
  "obs" = 1 - rms::survest(vcal,
                           times = 5,
                           newdata = gbsg5)$surv,
  
  "lower" = 1 - rms::survest(vcal,
                             times = 5,
                             newdata = gbsg5)$upper,
  
  "upper" = 1 - rms::survest(vcal,
                             times = 5,
                             newdata = gbsg5)$lower,
  "pred" = gbsg5$pred
)

dat_cal <- dat_cal[order(dat_cal$pred), ]

par(xaxs = "i", yaxs = "i", las = 1)
plot(
  dat_cal$pred, 
  dat_cal$obs,
  type = "l", 
  lty = 1, 
  xlim = c(0, 1),
  ylim = c(0, 1), 
  lwd = 2,
  xlab = "Predicted risk from developed model",
  ylab = "Predicted risk from refitted model", bty = "n"
)
lines(dat_cal$pred, 
      dat_cal$lower, 
      type = "l", 
      lty = 2, 
      lwd = 2)
lines(dat_cal$pred, 
      dat_cal$upper,
      type = "l", 
      lty = 2, 
      lwd = 2)
abline(0, 1, lwd = 2, lty = 2, col = 2)
legend("bottomright",
        c("Ideal calibration",
          "Calibration curve based on secondary Cox model",
          "95% confidence interval"),
        col = c(2, 1, 1),
        lty = c(2, 1, 2),
        lwd = c(2, 2, 2),
        bty = "n",
        cex = 0.85)

# Numerical measures
absdiff_cph <- abs(dat_cal$pred - dat_cal$obs)

numsum_cph <- c(
  "ICI" = mean(absdiff_cph),
  setNames(quantile(absdiff_cph, c(0.5, 0.9)), c("E50", "E90")),
  "Emax" = max(absdiff_cph)
)
numsum_cph

##        ICI        E50        E90       Emax 
## 0.03017958 0.02627822 0.07466548 0.07543556

Good calibration was estimated using calibration plot and calibration measures.

2.3 Overall performance measures

Two overall performance measures are proposed for prediction models with a survival outcome:

Brier score: it is the mean squared difference between observed event indicators and predicted risks at a fixed time point (e.g. at 5 years), lower is better;
Scaled Brier score, also known as Index of Prediction Accuracy (IPA): it improves interpretability by scaling the Brier Score. It is the decrease in Brier compared to a null model, expressed as a percentage, higher is better.

Click to expand code

# Libraries needed
if (!require("pacman")) install.packages("pacman")
library(pacman)
pacman::p_load(survival,
               Hmisc,
               riskRegression)

# Fit the model without PGR
efit1 <- coxph(Surv(ryear, rfs) ~ csize + nodes2 + nodes3 + grade3,
  data = rott5, 
  x = T, 
  y = T)

# The model with additional PGR marker
efit1_pgr  <- update(efit1, . ~ . + pgr2 + pgr3)

# Brier Score and IPA in the validation set (model without PGR)
score_gbsg5 <-
  riskRegression::Score(list("cox_validation" = efit1),
                        formula = Surv(ryear, rfs) ~ 1, 
                        data = gbsg5, 
                        conf.int = TRUE, 
                        times = 4.99,
                        cens.model = "km", 
                        metrics = "brier",
                        summary = "ipa"
)

# Extra: bootstrap confidence intervals for IPA ------
B <- 100
horizon <- 4.99
boots_ls <- lapply(seq_len(B), function(b) {
  
  # Resample validation data
  data_boot <- gbsg5[sample(nrow(gbsg5), replace = TRUE), ]

  
  # Get IPA on boot validation data
  score_boot <- riskRegression::Score(
    list("cox_validation" = efit1),
    formula = Surv(ryear, rfs) ~ 1,
    cens.model = "km", 
    data = data_boot, 
    conf.int = FALSE, 
    times = horizon,
    metrics = c("brier"),
    summary = c("ipa")
  )
  
  #.. can add other measure heres, eg. concordance
  
  ipa_boot <- score_boot$Brier$score[model == "cox_validation"][["IPA"]]
  cbind.data.frame("ipa" = ipa_boot)
})

df_boots <- do.call(rbind.data.frame, boots_ls)

##                                Estimate Lower .95  Upper .95
## Brier - Validation data            0.22       0.21      0.24
## Scaled Brier - Validation data     0.10       0.04      0.17

Brier and scaled Brier score were 0.22 and 0.10, respectively.

Goal 3 - Clinical utility

Discrimination and calibration measures are essential to assess the prediction performance but insufficient to evaluate the potential clinical utility of a risk prediction model for decision making. Clinical utility assessment evaluates whether a prediction model helps to improve decision making.
Clinical utility is measured by the net benefit that includes the number of true positives and the number of false positives. The true positives reflect the benefit of correctly predicting patients who will experience an event during a given time horizon, giving the opportunity to use additional interventions such as additional treatments, personalized follow-up or additional surgeries. The false positives represent the harms of incorrectly predicting events possibly leading to unnecessary additional interventions. Net benefit is the number of true positives classifications minus the false positives classifications weighted by a factor related to the harm of failing to predict an event versus and the harm of falsely predicting an event. The weighting is derived from a predefined threshold probability using a defined time horizon (for example 5 years since diagnosis). For example, a threshold of 10% implies that additional interventions for 10 patients of whom one would have experienced the event in the next 5 years if untreated is acceptable (thus treating 9 patients unnecessarily). This strategy is compared with the strategies of treating all and treating none of the patients. If overtreatment is harmful, a higher threshold should be used.
The net benefit is calculated as:

${\frac{TP}{n}-\frac{FP}{n}*\frac{p_t}{1-p_t}}$

TP=true positive patients
FP=false positive patients
n=number of patients and p_t is the risk threshold.

For survival data TP and FP is calculated as follows:
${TP = [1-S(t)| X=1]*P(X=1)*n}$

${FP = [S(t)| X=1]*P(X=1)*n}$

where
S(t) survival at time t
X=1 when the predicted probability at time t exceeds p_t

And the decision curve is calculated as follows:

Choose a time horizon (in this case 5 years);
Specify a risk threshold which reflects the ratio between harms and benefit of an additional intervention;
Calculate the number of true positives and false positives given the threshold specified in (2);
Calculate the net benefit of the survival model;
Plot net benefit on the y-axis against the risk threshold on the x-axis;
Repeat steps 2-4 for each model consideration;
Repeat steps 2-4 for the strategy assuming all patients are treated;
Draw a straight line parallel to the x-axis at y=0 representing the net benefit associated with the strategy assuming that none of the patients are treated.

We smoothed the decision curves based on the risk prediction models to reduce the visual impact of random noise using stats::smooth() function.

Click to expand code

if (!require("pacman")) install.packages("pacman")
library(pacman)
pacman::p_load(survival,
               Hmisc)

t_horizon <- 5

# Fit the model without PGR
efit1 <- coxph(Surv(ryear, rfs) ~ csize + nodes2 + nodes3 + grade3,
  data = rott5, 
  x = T, 
  y = T)

# The model with additional PGR marker
efit1_pgr  <- update(efit1, . ~ . + pgr2 + pgr3)

# Predicted risk 
gbsg5$pred <-predictRisk(efit1, 
                                  newdata = gbsg5,
                                  times = t_horizon)

# Run decision curve analysis

# Development data
# Model without PGR
gbsg5 <- as.data.frame(gbsg5)
dca_gbsg5 <- stdca(
  data = gbsg5, 
  outcome = "rfs", 
  ttoutcome = "ryear",
  timepoint = 5, 
  predictors = "pred", 
  xstop = 1.0,
  ymin = -0.01, 
  graph = FALSE
)

# Decision curves plot

# Smoothing DCA
dca_gbsg5_smooth <- smooth(dca_gbsg5$net.benefit$pred
                           [!is.na(dca_gbsg5$net.benefit$pred)],
                           twiceit = TRUE)
dca_gbsg5_smooth <- c(dca_gbsg5_smooth, 
                      rep(NA, sum(is.na(dca_gbsg5$net.benefit$pred))))

par(xaxs = "i", yaxs = "i", las = 1)
plot(dca_gbsg5$net.benefit$threshold,
  dca_gbsg5_smooth,
  type = "l", 
  lwd = 3,
  lty = 2,
  xlab = "Threshold probability in %", 
  ylab = "Net Benefit",
  xlim = c(0, 1), 
  ylim = c(-0.10, 0.60), 
  bty = "n",
  cex.lab = 1.2, 
  cex.axis = 1,
  col = 4
)
lines(dca_gbsg5$net.benefit$threshold, 
      dca_gbsg5$net.benefit$none, 
      type = "l", 
      lwd = 3, 
      lty = 4,
      col = 8)
lines(dca_gbsg5$net.benefit$threshold, 
      dca_gbsg5$net.benefit$all, 
      type = "l", 
      lwd = 3, 
      col = 2)
legend("topright",
  c(
    "Treat All",
    "Original model",
    "Treat None"
  ),
  lty = c(1, 2, 4), 
  lwd = 3, 
  col = c(2, 4, 8),
  bty = "n"
)

# read off at 23% threshold
dca_gbsg5$net.benefit[dca_gbsg5$net.benefit$threshold == 0.23, ]

## [1] "pred: No observations with risk greater than 93%, and therefore net benefit not calculable in this range."

##    threshold       all none      pred
## 23      0.23 0.3615002    0 0.3615002

The potential benefit at 23% threshold of the prediction model is 0.362. This means that the model might identify a net 36 patients out of 100 who will have recurrent breast cancer or die within 5 years of surgery and thus require adjuvant chemotherapy. The benefit of the prediction model is very close to ‘treat all’ strategy (0.362).

Potential benefit can be also defined in terms of net reduction of avoidable interventions (e.g adjuvant chemotherapy per 100 patients) by:

${\frac{NB_{model} - NB_{all}}{(p_t/ (1-p_t))}*100}$

where NB_model is the net benefit of the prediction model, NB_all is the net benefit of the strategy treat all and $p_{t}$ is the risk threshold.

Additional notes

To run the apparent validation find “gbsg5” and replace with “rott5” from paragraph “Discrimination” on.
To run the model with the PGR as additional biomarker find “efit1” and replace with with “efit1_pgr” from paragraph “Discrimination” on.

When use apparent validation, be careful to use the correct labels in the plot!

Reproducibility ticket

sessionInfo()

## R version 4.2.1 (2022-06-23 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 22000)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_Netherlands.utf8  LC_CTYPE=English_Netherlands.utf8   
## [3] LC_MONETARY=English_Netherlands.utf8 LC_NUMERIC=C                        
## [5] LC_TIME=English_Netherlands.utf8    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] here_1.0.1                timeROC_0.4              
##  [3] riskRegression_2022.09.23 rms_6.3-0                
##  [5] SparseM_1.81              Hmisc_4.7-1              
##  [7] ggplot2_3.3.6             Formula_1.2-4            
##  [9] lattice_0.20-45           survival_3.3-1           
## [11] pacman_0.5.1             
## 
## loaded via a namespace (and not attached):
##  [1] splines_4.2.1       foreach_1.5.2       prodlim_2019.11.13 
##  [4] assertthat_0.2.1    highr_0.9           latticeExtra_0.6-30
##  [7] pec_2022.05.04      yaml_2.3.5          globals_0.16.1     
## [10] numDeriv_2016.8-1.1 timereg_2.0.2       pillar_1.8.1       
## [13] backports_1.4.1     quantreg_5.94       glue_1.6.2         
## [16] digest_0.6.29       RColorBrewer_1.1-3  checkmate_2.1.0    
## [19] colorspace_2.0-3    sandwich_3.0-2      cmprsk_2.2-11      
## [22] htmltools_0.5.3     Matrix_1.5-1        pkgconfig_2.0.3    
## [25] listenv_0.8.0       mvtnorm_1.1-3       scales_1.2.1       
## [28] jpeg_0.1-9          lava_1.6.10         MatrixModels_0.5-1 
## [31] htmlTable_2.4.1     tibble_3.1.8        mets_1.3.1         
## [34] generics_0.1.3      TH.data_1.1-1       withr_2.5.0        
## [37] nnet_7.3-17         cli_3.4.1           magrittr_2.0.3     
## [40] deldir_1.0-6        polspline_1.1.20    evaluate_0.17      
## [43] parallelly_1.32.1   fansi_1.0.3         future_1.28.0      
## [46] nlme_3.1-157        MASS_7.3-57         foreign_0.8-82     
## [49] tools_4.2.1         data.table_1.14.2   lifecycle_1.0.3    
## [52] multcomp_1.4-20     stringr_1.4.1       interp_1.1-3       
## [55] munsell_0.5.0       cluster_2.1.3       compiler_4.2.1     
## [58] rlang_1.0.6         grid_4.2.1          iterators_1.0.14   
## [61] rstudioapi_0.14     htmlwidgets_1.5.4   base64enc_0.1-3    
## [64] rmarkdown_2.17      gtable_0.3.1        codetools_0.2-18   
## [67] DBI_1.1.3           R6_2.5.1            gridExtra_2.3      
## [70] zoo_1.8-11          knitr_1.40          dplyr_1.0.10       
## [73] fastmap_1.1.0       future.apply_1.9.1  utf8_1.2.2         
## [76] rprojroot_2.0.3     stringi_1.7.8       parallel_4.2.1     
## [79] Rcpp_1.0.9          vctrs_0.4.2         rpart_4.1.16       
## [82] png_0.1-7           tidyselect_1.2.0    xfun_0.33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

01_predsurv_minimal_RMD.md

01_predsurv_minimal_RMD.md

Performance assessment of survival prediction models - simplified code

Goals

Set up - load packages and import data

Data and recoding

Goal 1 - Develop a risk prediction model with a time to event outcome

1.1 Model development - fit the risk prediction models

Goal 2 - Assessing performance in survival prediction models

2.1 Discrimination measures

2.2 Calibration

2.2.1 Mean calibration - observed/expected ratio for fixed time point

2.2.2 Weak calibration - calibration slope for fixed time point

2.2.3 Moderate calibration - fixed time point

2.3 Overall performance measures

Goal 3 - Clinical utility

Additional notes

Reproducibility ticket

Files

01_predsurv_minimal_RMD.md

Latest commit

History

01_predsurv_minimal_RMD.md

File metadata and controls

Performance assessment of survival prediction models - simplified code

Goals

Set up - load packages and import data

Data and recoding

Goal 1 - Develop a risk prediction model with a time to event outcome

1.1 Model development - fit the risk prediction models

Goal 2 - Assessing performance in survival prediction models

2.1 Discrimination measures

2.2 Calibration

2.2.1 Mean calibration - observed/expected ratio for fixed time point

2.2.2 Weak calibration - calibration slope for fixed time point

2.2.3 Moderate calibration - fixed time point

2.3 Overall performance measures

Goal 3 - Clinical utility

Additional notes

Reproducibility ticket