Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error versus warning on interactions without main effects #205

Open
vanAmsterdam opened this issue Jul 11, 2022 · 2 comments
Open

error versus warning on interactions without main effects #205

vanAmsterdam opened this issue Jul 11, 2022 · 2 comments

Comments

@vanAmsterdam
Copy link

Hi,

Let's say there is a sound reason to fit a cox model conditional on some categorical variable (e.g. age group), and an interaction of a course-grained version of this categorical variable (e.g. age over 50) with another variable, like below:

library(survival)
colon$age_cat <- cut(colon$age, breaks = c(0, 50, 70, Inf), include.lowest = T)
colon$age_over50 <- colon$age > 50
f <- coxph(Surv(time,status)~sex+sex:age_over50+age_cat, data=colon)
survfit(f)

Currently, survfit throws an error:

"not able to create a curve for models that contain an interaction without the lower order effect"

However, given that the model conditions on the more fine-grained age_cat I cannot come up with any mathematical reason why this survfit would not be meaningful.

Should this error be replaced by a warning? Or is there a reason that such models should never be allowed?

@therneau
Copy link
Owner

The real issue is three fold: in this situation survfit can create a curve that is nonsense, doing otherwise would require a a level of artificial intelligence in the routine far beyond my skills, and many users think the survival package is infallible. Consider the following small change in your example:
...
colon$age.lt50 <- 1*(colon$age < 50)
fit <- coxph(Surv(time, status) ~ age + age:age.lt50 + age_cat, data=colon)
fit$mean
sex age_cat(50,70] age_cat(70,Inf] sex:age50
0 0 0 0 0
...
survfit(fit) will try to produce a curve for the above combination of covariates, which is someone who is simultaneously less than 50 and not less than 50. It will even look like a curve, but is utter rubbish. I do not want to ever produce rubbish by default.
I may have a mistake in the example, but we can easily work one out. (The fact that we could have a debate over which examples do and do not produce impossible curves helps prove that detecting this is hard.)

A possible solution, which I will think on a bit, is to make the error message go away for someone who uses the newdata argument. I would be more willing to state "caveat emptor" in that case. What do you think? I'm currently on a trip, and won't do anything till I return. The code in noweb/coxsurv*.Rnw that deals with all these cases is dammably complex, so it is mandatory that any change be accompanied by a new test case to verify that nothing go mucked up. Not a job I will attempt on this little laptop screen.

@vanAmsterdam
Copy link
Author

thanks for following up

I see the potential ways in which this may go wrong and I get that 'human-error'-safety is valuable.
How do other routines (lm, glm) approach this? Are there comparable situations for those?

Regarding the fix with supplying newdata, that could be a good middle ground but wouldn't work for my specific case. I encountered the error when imputing data using smcfcs (https://github.com/jwb133/smcfcs), which calls basehaz as part of the routine. Personally I adapted the survival source code locally and changed the error to a warning so I got around it.

Whether or not it is better to protect people from unknowingly doing things that are mathematically meaningless, or to not block researchers who may know the math but are not coding-knowledgeable enough to change the source code of an R package, is something you probably have a better idea of then me.

Another idea: would it be possible to add a global options parameter that basically switches between throwing an error versus a warning? It could be that the default option is to throw an error, but then let the error message say "if you (really) know what you're doing, set option abc to 'warning'"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants