Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaled Schoenfeld residuals not affected by robust variance estimation #161

Open
Irina-Sv opened this issue Sep 14, 2021 · 3 comments
Open

Comments

@Irina-Sv
Copy link

Good day!

I have noticed that scaled Schoenfeld residuals and variance estimates from cox.zph differ between version 3.1-11 and 2.41-3 of Survival and that it seem to depend on the use of robust estimation of variance.

  • For version 3.1-11 and 3.2-13, Scaled Schoenfeld residuals and variance estimates from cox.zph are the same when:
    robust variance estimation is performed
    robust variance estimation is not performed

  • For version 2.41-3, Scaled Schoenfeld residuals and variance estimates from cox.zph are different when:
    robust variance estimation is performed
    robust variance estimation is not performed

  • Scaled Schoenfeld residuals from cox.zph are also similar between:
    the version 3.1-11 when robust variance estimation is performed (I did not test for version 3.2-13)
    the version 2.41-3 when robust variance estimation is not performed

Is it that in version 3.1-11 and 3.2-13 of survival, cox.zph does not scale the Schoenfeld residuals by robust variance estimation of the parameters? We would be very grateful for any feedback about this.

As an example, consider the following example data:

  # --- Install Survival version 2.41-3 (latest version in our environment is 3.1-11) ---
  # library(devtools)
  # 
  # #install_version("survival", version = "2.41-3", lib = "999_temp_pkg", repos = "http://cran.us.r-project.org")
  # 
  
  
  #--- Load the diabetic data (can be found in survival version 3.1-11) ---
  data(diabetic, package = "survival")
  juvenile <- 1*(diabetic$age < 20)

  #--- Load Survival version 3.1-11 ---
  library(survival) 
  
  # Model with robust variance estimation 
  fit_hat_newer <- coxph(Surv(time, status) ~ trt + juvenile, cluster = id, data= diabetic)
  A_hat_newer <- cox.zph(fit_hat_newer, terms = F)
  # Model without robust variance estimation 
  fit_newer <- coxph(Surv(time, status) ~ trt + juvenile, data= diabetic)
  A_newer <- cox.zph(fit_newer, terms = F)
  
  #--- Load Survival version 2.41-3 ---
  detach("package:survival", unload = TRUE)  
  library(survival, lib.loc = "999_temp_pkg") # Version 2.41-3
  
  # Model with robust variance estimation 
  fit_hat_older <- coxph(Surv(time, status) ~ trt + juvenile + cluster(id), data= diabetic)
  A_hat_older <- cox.zph(fit_hat_older)
  # Model without robust variance estimation 
  fit_older <- coxph(Surv(time, status) ~ trt + juvenile, data= diabetic)
  A_older <- cox.zph(fit_older)

  #--- Compare estimates ---
  # A_hat_newer: Returned output from the newer version of cox.zph with robust variance estimation
  # A_hat_older: Returned output from the older version of cox.zph with robust variance estimation
  # A_newer: Returned output from the newer version of cox.zph without robust variance estimation
  # A_older: Returned output from the newer version of cox.zph without robust variance estimation
  
  
  print(max(abs(A_hat_newer$y - A_newer$y))) # =0.00 Same estimates
  print(max(abs(A_hat_newer$var - A_newer$var)))  #=0.00 Same estimates
  # For the newer version, Scaled Schoenfeld residuals are the same when:
  # robust variance estimation is performed
  # robust variance estimation is not performed
  
  print(max(abs(A_hat_older$y - A_older$y))) #=0.99 Different estimates
  print(max(abs(A_hat_older$var - A_older$var))) #=0.01 Different estimates
  # For the older version, Scaled Schoenfeld residuals are different when:
  # robust variance estimation is performed
  # robust variance estimation is not performed
  
  print(max(abs(A_hat_newer$y - A_older$y))) #=1e-14 Almost the same estimates
  # The Scaled Schoenfeld residuals are similar between:
  # estimates from the newer version of survival with robust estimation
  # estimates from the older version of survival without robust estimation  
  
@therneau
Copy link
Owner

therneau commented Sep 14, 2021

You are correct, and this is a problem.
Older versions of cox.zph used an approximate score test, which is quite good 99% of the time, but could give seriously biased answers if there was a strong strata* covariate interplay, e.g., a covariate with variance V in one strata and 0 in another strata. Think of an analysis stratified by institution, a factor variable "X" as a covariate, and some of the institutions only enrolled a subset of the levels; some of the dummy variables for X will have this pattern. This had been noticed and critiqued many years ago, and I finally addressed and added some serious test cases. To get both correctness and speed it uses C code.

What I didn't think through was that the approximate method 'inherited' a robust variance, automatically, but I now need to handle this explicitly. Damn. My test suite didn't have zph + robust variance case. First I need to work out exactly what the proper mathematics is, however.

@Irina-Sv
Copy link
Author

Thank you so much for the quick and clear reply, and thank you for maintaining such an amazing R package!!

@therneau
Copy link
Owner

I have been very slow to attach this issue, but I've finally rolled up my sleeves and carefully looked at it. That started with adding all the math detail to the vignette. I now see that I need to create special code for the dfbeta residuals. I've started on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants