Return correct standard errors in `tidy.survfit()` #1162

mccarthy-m-g · 2023-06-03T23:51:18Z

tidy.survfit() currently returns standard errors for the cumulative hazard instead of the survival probability. This PR fixes this so that standard errors for the survival probability are returned. This makes the output consistent with summary.survfit.

Here's a blog post covering the sneaky behaviour of standard errors in survival. The short version is that fit$std.err (which tidy.survfit() currently uses) returns the standard errors for the cumulative hazard, whereas summary(fit)$std.err returns the standard errors for the survival probabilities.

library(survival)
library(broom)

fit <- survfit(Surv(time, status) ~ x, data = aml)

# Cumulative hazard
fit$std.err
#>  [1] 0.09534626 0.14213381 0.19508758 0.24873417 0.24873417 0.33446777
#>  [7] 0.44181673 0.44181673 0.83378775 0.83378775 0.12909944 0.20412415
#> [13] 0.24397502 0.24397502 0.30472470 0.37796447 0.47559487 0.62678317
#> [19] 0.94491118        Inf

# Survival probability
summary(fit)$std.err
#>  [1] 0.08667842 0.11629130 0.13966497 0.15263233 0.16419327 0.16266889
#>  [7] 0.15349275 0.10758287 0.13608276 0.14231876 0.14813006 0.14698618
#> [13] 0.13871517 0.12187451 0.09186636        NaN

^{Created on 2023-06-03 with reprex v2.0.2}

Edit

After using this on some examples, I think it's better to calculate manually instead of using summary(); otherwise you get an error when using survfit objects that have been reformulated with survfit0() due to incompatible numbers of rows.

I've updated the PR to use the calculations done in the survival package instead of summary() like in the reprex above.

For reference, here are the relevant line in survival's source code.

survfit:

https://github.com/therneau/survival/blob/e8de7b732daa9a82c1ceaad93fd68ee7545c7736/R/survfitms.R#L178

survfitms:

https://github.com/therneau/survival/blob/e8de7b732daa9a82c1ceaad93fd68ee7545c7736/R/survfitms.R#L375

…ve hazard

…lity with objects reformulated by `survfit0()`

simonpcouch · 2023-06-05T18:07:33Z

Thanks for the PR! I'm seeing

── Error ('test-survival-survfit.R:30:3'): tidy.survfit ────────────────────────
Error in `data.frame(time = x$time, n.risk = c(x$n.risk), n.event = c(x$n.event), 
    n.censor = c(x$n.censor), estimate = c(x$pstate), std.error = c(x$std.err * 
        x$surv), conf.high = c(x$upper), conf.low = c(x$lower), 
    state = rep(x$states, each = nrow(x$pstate)))`: arguments imply differing number of rows: 237, 711, 0

in tests. Looks like this may be related to 03a4662—could you address that failure?

…lity with objects reformulated by `survfit0()`

mccarthy-m-g · 2023-06-05T20:05:54Z

Welcome!

I addressed the failure, and also made this a bit more robust by using the same conditional statement as survival does to check if the standard errors need to be fixed. Should be good now.

simonpcouch · 2023-06-06T19:21:34Z

Alright, got it. Thanks again for putting this together!

After sitting with this for a bit, I think our best move here is to better document the current output. While the current output takes a moment of pause to interface with (and is certainly documented incorrectly), it is true to survival's implementation, and transitioning to a mix-and-match of slots of both the original object and its summary feels like a lateral move in terms of clarity/safety. At another point in the package's lifecycle, I may have considered adding a tidy.summary.survfit() method here, though that change is at odds with the notes in release 0.7.0 and the maintenance guidelines.

Would you be up for updating that documentation?

mccarthy-m-g · 2023-06-06T21:32:27Z

I'm up for updating the documentation. You're right that that would be more consistent with the information stored in survfit.object. For reference, here's how ?survfit.object defines std.err:

For a survival curve this contains standard error of the cumulative hazard or -log(survival), for a multi-state curve it contains the standard error of prev. This difference is a reflection of the fact that each is the natural calculation for that case.

However, I think this will lead to real errors in practice, as people generally assume the standard error next to an estimate to correspond to that estimate, not a different estimate that isn't shown.

As an alternative, would it be possible to add some sort of type argument to the function to return estimates, standard errors (and intervals) for the different curves? This would also be true to survival's implementation, and would be clearer and safer than only updating documentation or taking the mix-and-match approach.

simonpcouch · 2023-06-09T18:19:41Z

A type argument that proxies whether output mirrors the object structure of the model object or its summary feels to me like a workaround to not just newly adding a tidier method for the summary object, and thus again a lateral move in terms of clarity. Let's stick with just improving documentation here.👍

mccarthy-m-g added 2 commits June 3, 2023 16:39

return standard errors for the survival probability, not the cumulati…

e6ed2f9

…ve hazard

compute std.error manually instead of using summary() for compatibi…

03a4662

…lity with objects reformulated by `survfit0()`

mccarthy-m-g added 2 commits June 5, 2023 12:51

compute std.error manually instead of using summary() for compatibi…

73638a7

…lity with objects reformulated by `survfit0()`

Resolved merge conflict

03715fa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return correct standard errors in `tidy.survfit()` #1162

Return correct standard errors in `tidy.survfit()` #1162

mccarthy-m-g commented Jun 3, 2023 •

edited

simonpcouch commented Jun 5, 2023

mccarthy-m-g commented Jun 5, 2023

simonpcouch commented Jun 6, 2023

mccarthy-m-g commented Jun 6, 2023

simonpcouch commented Jun 9, 2023

Return correct standard errors in tidy.survfit() #1162

Are you sure you want to change the base?

Return correct standard errors in tidy.survfit() #1162

Conversation

mccarthy-m-g commented Jun 3, 2023 • edited

Edit

simonpcouch commented Jun 5, 2023

mccarthy-m-g commented Jun 5, 2023

simonpcouch commented Jun 6, 2023

mccarthy-m-g commented Jun 6, 2023

simonpcouch commented Jun 9, 2023

Return correct standard errors in `tidy.survfit()` #1162

Return correct standard errors in `tidy.survfit()` #1162

mccarthy-m-g commented Jun 3, 2023 •

edited