Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updating a variable with tdc #192

Closed
therneau opened this issue Apr 21, 2022 · 2 comments
Closed

updating a variable with tdc #192

therneau opened this issue Apr 21, 2022 · 2 comments

Comments

@therneau
Copy link
Owner

therneau commented Apr 21, 2022

This issue was raised by Sam Buttery via email (which I normally prefer), but I'm moving it here in order to perhaps get some comments and opinions on the best answer. Starting with his email

First, thank you (and your colleagues) for the very well done survival package. I have an observation that might demonstrate a bug and a simple example that, I hope, shows the different ways a particular task is treated in versions 3.2-7, 3.2-11, and 3.3-1.

In my example I’d like to use tmerge() to create a “long” data set from these observations:
(df <- data.frame(id = 1:3, x = c(100,100,100), thing = c(1.2,1.2,1.2), time = c(0,0,0),
thing2= c(2.1,NA,2.3), time2 = c(2,NA,2),
thing3 = c(NA,NA,3.5), time3 = c(NA,NA,3),
end = c(4,4,4)))
id x thing time thing2 time2 thing3 time3 end
1 1 100 1.2 0 2.1 2 NA NA 4
2 2 100 1.2 0 NA NA NA NA 4
3 3 100 1.2 0 2.3 2 3.5 3 4

I use these commands:

( newdf <- tmerge(df[,1:2], df, id = id, tstop = end))
( newdf <- tmerge(newdf, df, id = id, thing = tdc( time, thing) ) )
( newdf <- tmerge(newdf, df, id = id, thing = tdc( time2, thing2) ) ) # see below
( newdf <- tmerge(newdf, df, id = id, thing = tdc( time3, thing3) ) )

Under the current version, this second-to-last line produces an error that says:
Error in tmerge(newdf, df, id = id, thing = tdc(time2, thing2)) :
tdc update does not match prior variable type: thing # from version 3.3-1

Now, at your github site, https://rdrr.io/github/therneau/survival/src/R/tmerge.R , the code for tmerge() includes these lines:
"# Next line removed 11/2021, I can't recall why was added
#if (!is.null(yinc)) newvar <- NULL #a tdc can't be updated, other than 0/1"

If I reinstate that if() (I call the updated version tmergx() ), I can run that second to last line, producing this:
( newdf <- tmergex(newdf, df, id = id, thing = tdc( time2, thing2) ) )# 3.2-11 does this
id x tstart tstop thing
1 1 100 0 2 NA
2 1 100 2 4 2.1
3 2 100 0 4 NA
4 3 100 0 2 NA
5 3 100 2 4 2.3

So at least it works, but it’s not what I expect. I expect “thing” for the first row to be 1.2, for example. Survival 3.2-7, that same command, using the tmerge from that version, produces this output, which I think is what I want. Do you agree?

( newdf <- tmerge(newdf, df, id = id, thing = tdc( time2, thing2) ) )# v. 3.2-7.
id x tstart tstop thing
1 1 100 0 2 1.2
2 1 100 2 4 2.1
3 2 100 0 4 1.2
4 3 100 0 2 1.2
5 3 100 2 4 2.3

I know you’re busy, but I would love it if you could look into this. I hope this question isn’t a trivial on
Thanks,
Sam Buttrey


I wrote the tmerge command under the assumption that each new tdc(), cumtdc(), even(), cumevent() would create a new variable. The notion that they could update the same variable has come up a few times, and always leads me down a rabbit hole. Here are some of the issues. I'll use "zed" as the name of the target variable of interest.

  1. What if we have zed= tdc(.... and zed already exists in the baseline data set? It could be that this simply a typing or memory error (that is how is normally arises for me), and the user never indended to replace that variable. One correct response would be to ignore the prior value and build a new tdc() zed de novo, while issuing a warning message. On the other hand, maybe the user did know what they were doing, and expect us to use the pre-existing zed value as the starting value for the first interval. Or perhaps they did expect it to replace, and will be annoyed by a warning message.
  2. One of the additions to the tmerge code, over time, was that it now remembers which current variables were created by a tdc. So what should it do when someone attempts to update a tdc?
    a. fold the new values into the old. A major problem is what to do if the data type of the addition != data type of the old. I simply have not had the energy to tackle this, since there are so many special cases. Both 0/1 is easy. What about 0/1 + logical, numeric + factor, two factors with different level sets (okay if new levels are a superset of old?). Dates are painful since ifelse() loses the class, coding for them has been like walking on eggshells. Perhaps an aggressive "sorry, can't do that" error message for unanticipated combinations --- anything you can think of will eventually come up.
    b. What about tied times. New one wins? What if the new value is NA and the old is not?
    c. How should a cumtdc be updated?
    d. Same issues for event and cumevent()

All in all, it's enough coding work that my response, multiple times, has been "I'll paper it over for now and come back later to think it through". But the 'come back later' never seems to materialize.

  1. Per the code quoted above by Sam, I had chosen (at least at one time) the action: update the tdc if both are 0/1, otherwise rewrite it from scratch. This is an admittedly half-assed solution.

So. What do others think that the routine should do? For the data sets I work with the "update a tdc" use case arises very rarely, so I don't have strong priors opinions. And, once comments are in, any volunteers to give the updates a try?
Any change, or even the status quo, deserves a nice long addition to the test directory, checking out a large number of combinations, with comments on what we expect. Then also add this discussion to the vignette and help files.

Churchill said “Democracy is the worst form of government, except for all those other forms that have been tried from time to time.” There are many times I feel exactly that way about tmerge -- using it drives me batty sometimes. But I haven't yet found the better solution to building these data sets.

Terry T.

@therneau
Copy link
Owner Author

I thought about this some more, and I now think updating a tdc() is impossible to do reliably. Assume that the tmerge data set has multiple tdc variables, and that these were not necessarily changed on the same dates; this is often true for my data sets. As a result we have four intervals (0,10), (10,20), (20,30) and (30,40) and tdc variable x with values of 1, 2, 2, 3. You now do an new tdc() to add an x value of 4 at time 15. The new data will have intervals of (0,10), (10,15), (15,20), (20,30) and (30,40). The problem is that you don't know, at this point, whether the x=2 value for time (20,30) was an actual measurement at time 20, or simply a carry over from a measurement at time 10. If the former, the new data should have values of 1, 2, 4, 2, 3 for the 5 intervals; if the latter 1,2,4,4,3 is correct. That is, you don't know the duration of the new value. The rule in tmerge is currently that tdc replaces, it does not attempt to augment.

@therneau
Copy link
Owner Author

I've thought about this some more, and now believe that there is no valid was to update a tdc() variable. Here is the problem.

  1. A variable x1= tdc(day, value) has created time intervals of (0, 60), (60, 120), (120,300) for the variable, values of 1, 5, 8 say.
  2. A variable x2= tdc(day, value) has further split the data at time 100, so the values x1 values are now 1, 5, 5, 8
  3. A second x1=tdc() call wants to add the value of "12" at time 90. Since each step in a tmerge is done independently, the routine at this point has no way of knowing whether that first '5' at day 100 is an actual measurement -- the same value happened to arise twice-- or an inserted repeat. In the first case the values should be 1, 5,12, 5, 8 ; in the second it would be 1, 5, 12, 12, 8, Is (100, 120) a 12 or a 5?

To do this the routine would either need a full transactional history, or would need to mark each individual tdc variable, at each time, as 'real' or 'repeat'. This discussion needs to be added to the tdc vignette.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant