Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: add_n() for time dependent coxph() #1625

Closed
jaromilfrossard opened this issue Apr 11, 2024 · 7 comments
Closed

Bug Report: add_n() for time dependent coxph() #1625

jaromilfrossard opened this issue Apr 11, 2024 · 7 comments

Comments

@jaromilfrossard
Copy link

Thank you for creating gtsummary!

Currently, in a time-dependent survival::coxph(), the n from add_n() describes the number of observation, instead of the number of id, which might be misleading. Indeed the same statistical model might have different number of observation depending on its data structure used in survival::coxph().
Currently, the number of "id" is not exported in the coxph model, which I suggest to add survival::coxph()
therneau/survival#250.

library(survival)
library(gtsummary)

set.seed(42)
n=30
df <- 
  data.frame(
    id = seq_len(n),
    eos_time = round(runif(n,100,200)),
    eos_event = rbinom(n,1,prob=.2),
    event_01 = round(runif(n,20,40)),
    event_02 = round(runif(n,60,80)),
    grp = rep(c(1,0),each= n/2)) 

df$eos_time2 <- df$eos_time
df$eos_time2[df$eos_event==0]<-NA

df_tv <-
  tmerge(df[,c("id","grp")],df[,c("id","eos_time")],id=id,tstop=eos_time) |> 
  tmerge(data2=df[,c("id","event_01","event_02")],event_01 = event(event_01),event_01 = event(event_02),id=id) |> 
  tmerge(data2=df[,c("id","eos_time2")],eos_event = event(eos_time2),id=id)

# same model different n!
# n=30
m <- coxph(Surv(eos_time,eos_event)~grp,data=df)
summary(m)
m |> 
  tbl_regression() |> 
  add_n()

# n=90
m_tv <- coxph(Surv(time = tstart,time2 =tstop,eos_event)~grp,data=df_tv,id=id)
summary(m_tv)
m_tv |> 
  tbl_regression() |> 
  add_n()
@larmarange
Copy link
Collaborator

Should be implemented in broom.helpers. Let me explore it

@larmarange
Copy link
Collaborator

larmarange commented Apr 11, 2024

A quick comment. So far, broom.helpers computes for coxph models a number of observations, a number of event and exposure time globally and for each level of a categorical variable.

This is the current documentation of broom.helpers::tidy_add_n():

For Cox models (survival::coxph()), an individual could be coded with several observations (several rows). n_obs will correspond to the weighted number of observations which could be different from the number of individuals. tidy_add_n() will also compute a (weighted) number of events (n_event) according to the definition of the survival::Surv() object. Exposure time is also returned in exposure column. It is equal to the (weighted) sum of the time variable if only one variable time is passed to survival::Surv(), and to the (weighted) sum of time2 - time if two time variables are defined in survival::Surv().

from: https://larmarange.github.io/broom.helpers/reference/tidy_add_n.html

Currently, the number of observations could be added in gtsummary() with add_n() and the number of events with add_nevent(). There is no function in gtsummary for adding exposure time.

I have quickly drafted some code to compute, in addition to the number of observations, the number of individuals n_ind: larmarange/broom.helpers#251

It has to be noted, for categorical variable, that if there are two observations of the same individual for modality A and one observation for modality B, for now we are applied 2/3 to A and 1/3 to B (such situation occurs with time-varying co-factors). Sampling weights are also taken into account.

I am not sure if the number of individuals should replace the number of observations. They are not exactly the same thing.

However, it would be nice to have therefore a way to display n_obs or n_ind depending on user preference, and/or a way to also display exposure time. Eventually with a more generic version of add_n() allwoing to indicate a glue text with the stats to be displayed?

@ddsjoberg What do you think? Should we try to be more generic? Should we have distinction between the number of observations the number of individuals? or change the default behaviour for cox models?

library(survival)
library(gtsummary)
library(broom.helpers)
#> 
#> Attachement du package : 'broom.helpers'
#> Les objets suivants sont masqués depuis 'package:gtsummary':
#> 
#>     all_continuous, all_contrasts

set.seed(42)
n=30
df <- 
  data.frame(
    id = seq_len(n),
    eos_time = round(runif(n,100,200)),
    eos_event = rbinom(n,1,prob=.2),
    event_01 = round(runif(n,20,40)),
    event_02 = round(runif(n,60,80)),
    grp = factor(rep(c(1,0),each= n/2))
  )

df$eos_time2 <- df$eos_time
df$eos_time2[df$eos_event==0]<-NA

df_tv <-
  tmerge(df[,c("id","grp")],df[,c("id","eos_time")],id=id,tstop=eos_time) |> 
  tmerge(data2=df[,c("id","event_01","event_02")],event_01 = event(event_01),event_01 = event(event_02),id=id) |> 
  tmerge(data2=df[,c("id","eos_time2")],eos_event = event(eos_time2),id=id) |> 
  dplyr::rename(identif = id)

m <- coxph(Surv(eos_time,eos_event)~grp,data=df)
m |> 
  model_get_n()
#> # A tibble: 3 × 5
#>   term        n_obs n_ind n_event exposure
#>   <chr>       <dbl> <dbl>   <dbl>    <dbl>
#> 1 (Intercept)    30    30       7     4843
#> 2 grp1           15    15       4     2420
#> 3 grp0           15    15       3     2423

m_tv <- coxph(Surv(time = tstart,time2 =tstop,eos_event)~grp,data=df_tv,id=identif)
m_tv |> 
  model_get_n()
#> # A tibble: 3 × 5
#>   term        n_obs n_ind n_event exposure
#>   <chr>       <dbl> <dbl>   <dbl>    <dbl>
#> 1 (Intercept)    90    30       7     4843
#> 2 grp1           45    15       4     2420
#> 3 grp0           45    15       3     2423

Created on 2024-04-11 with reprex v2.1.0

@ddsjoberg
Copy link
Owner

@larmarange we had a similar request very early on in the gtsummary life. I think on the surface it it a super reasonable request. But I found it quickly became tricky to handle this for all types of models

  • Time-dependent Cox, I think it's pretty clear what N we really want.
  • But what about a random intercept model? If a patient has one row per tumor, then the number of patients would be the N to report. But the same model where students are nested within schools, people most likely will want N to be the number of students. This model can easily become much more complex with multiple random compenents.
  • Then we have the same situation above, but for every other repeated measures type model, and various packages that implement them.

We identified a number of situations where the N users want to report is different for the same model type, and we opted to keep it simple and just report the number of observation because this is something that was clear. Users can pretty easily calculate tabulations for whatever N is appropriate for their scenario in tbl_summary() and merge that table with the regression summary.

I am a bit apprehensive for being able to solve this correctly in all situations....

@larmarange
Copy link
Collaborator

@ddsjoberg I completely agree that we cannot cover all cases and I do not plan to cover all of them in broom.helpers.

However, considering that some of the code is very similar between add_n.tbl_regression() and add_nevents.tbl_regression(), maybe we could consider to add to add_n.tbl_regression() two arguements: one to let the user to select another column returned by tidy_plus_plus() and one to customize the header of the col.

@ddsjoberg
Copy link
Owner

Those are good points! Sounds good to me. Nice!

@ddsjoberg
Copy link
Owner

I think I can close this as it's handled in broom.helpers. Please re-open if I need to update something in gtsummary

@larmarange
Copy link
Collaborator

There is still the question of upgrading add_n() to select a specific column of the tibble returned by broom.helpers, and refactoring add_nevent as a specific case of add_n().

But considering the current work on v2 of gtsummary, it could maybe be postponed after the release of v2?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants