Survival models #71

yunwezhang · 2021-04-20T06:46:43Z

Hi maintainer,

I am wondering is that possible to cascade random survival forest (maybe a sksurv model) instead of RF in your deep forest model? As in #48, it seems that the supported model types are classification and regression. (or did I miss some parts of those tutorial docs?)

Thanks.

xuyxu · 2021-04-20T11:34:02Z

Hi @yunwezhang, after walking through the example on random survival forest in sksurv, I think the biggest problem on using deep forest in survival analysis tasks is how to design good augmented features. In survival analysis, our main concern is the survival predicting function that takes time steps t as the input, right? For now, I cannot figure out how to ingest this into the cascade structure of deep forest.

Since we are not quite familiar with survival analysis, your suggestions would be highly welcomed ;-)

EDIT: We are happy to work on this feature request if this is achievable.

yunwezhang · 2021-04-20T21:07:32Z

Hi Yixuan,

Yes, you are right about the time steps, the input part of survival models requires a 2-dim thing as the outcome (time+binary status, where this binary means censored or not) but the output is usually a 1-dim vector, either "risk" or "probability" (as in binary classification).

As for the augmented feature steps, i assume you are talking about this part in the model structure?

Is this part corresponding to this part in the paper?

Because to me, if I understand correctly, in the cascade forest part, the augmented features (in-model feature transformation) obtained from each forest are the predicted vectors, which can be obtained from a survival forest (the output survival probability). However, I am not clear about the attached picture part. (I think the 2019 paper has it because it is better for image data....)

Thank you for looking into it and I am not sure how hard it is to add the random survival model. I am happy to chat with you to see how it goes. In summary, the change for the input data needs to be X (n by p), y (both time and status) and the output is probability vector (could be survival risk, 1 year survival probability, 2 year survival prob, etc.) 😊

xuyxu · 2021-04-21T06:14:23Z

Thanks for your kind explanations @yunwezhang.

As for the augmented feature steps, i assume you are talking about this part in the model structure?
Is this part corresponding to this part in the paper?

No, the second figure posted by you shows the multi-grained scanning part, which is not included in this package, since tree ensembles are typically not the best choice for structured data such as images or audios. Augmented features refer to part of the input for hidden cascade layers. For classification, they are predicted class probabilities; For regression, they are predicted target values.

Here are three questions that I would like to ask further.

Is random survival forest the state-of-the-art model for survival analysis.
To conclude your explanations, what we need to modify is the input format, right? Apart from X, we also need to enroll an indicator array on time and status.
Can we use the risk score as the augmented features?

yunwezhang · 2021-04-21T07:40:58Z

Hi Yixuan,

Thanks for the fast reply. I am aware that the multi-grain scanning is not included and that's why I asked why do you have the part (first figure) in your model structure instead of starting from the cascade forest.

Answer for the further questions:

To me, RSF yes. There are some DNN based survival models also considered stated-of-art. (But RSF has the best performance in general among the several datasets I tried.)
Yes, only modify the input part (not the feature matrix X_train but the response part y_train) because the output of the RSF is 1-dim, the probability.
I think both the risk score and the predicted probability can be used as augmented features. (My experience of using RSF is in R but theoretically, both packages are based on the same paper so there will be no difference.) My reading from that package shows that the survival probability is not provided in the predict function though.

xuyxu · 2021-04-21T07:57:01Z

Thanks for the fast reply. I am aware that the multi-grain scanning is not included and that's why I asked why do you have the part (first figure) in your model structure instead of starting from the cascade forest.

The binner in that figure is used to reduce the number of splitting candidates for the sake of acceleration (not used in the original deep forest model). The entire architecture does correspond to the cascade forest structure.

Besides, I have opened up a feature request in sksurv (link), deep forest could benefit from using a mixture of RandomSurvivalForest and ExtraSurvivalTrees in cascade layers. Let's wait for the response from maintainers of sksurv before formally working on this feature request ;-)

yunwezhang · 2021-04-21T08:03:28Z

got it!
yes, let's wait for the reply. To have that extra injection of randomness, it would be better to have ExtraSurvivalTrees.

xuyxu · 2021-04-23T07:12:34Z

Realizing that we can implement ExtraSurvivalTrees by importing sksurv as a soft dependency, I think we could work on this feature request without extra helps from that community.

Thank you for looking into it and I am not sure how hard it is to add the random survival model. I am happy to chat with you to see how it goes. In summary, the change for the input data needs to be X (n by p), y (both time and status) and the output is probability vector (could be survival risk, 1 year survival probability, 2 year survival prob, etc.) 😊

If you are interested in extending deep forest to the field of survival analysis, could you contact me through an e-mail (Address), so that we can have more discussions before opening a draft PR on this feature ;-）

xuyxu · 2021-04-29T08:17:46Z

Closed via #14.

xuyxu added the feature request New feature or request label Apr 20, 2021

xuyxu mentioned this issue Apr 21, 2021

Feature Requests #14

Open

13 tasks

xuyxu closed this as completed Apr 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Survival models #71

Survival models #71

yunwezhang commented Apr 20, 2021

xuyxu commented Apr 20, 2021 •

edited

yunwezhang commented Apr 20, 2021

xuyxu commented Apr 21, 2021 •

edited

yunwezhang commented Apr 21, 2021 •

edited

xuyxu commented Apr 21, 2021

yunwezhang commented Apr 21, 2021

xuyxu commented Apr 23, 2021

xuyxu commented Apr 29, 2021

Survival models #71

Survival models #71

Comments

yunwezhang commented Apr 20, 2021

xuyxu commented Apr 20, 2021 • edited

yunwezhang commented Apr 20, 2021

xuyxu commented Apr 21, 2021 • edited

yunwezhang commented Apr 21, 2021 • edited

xuyxu commented Apr 21, 2021

yunwezhang commented Apr 21, 2021

xuyxu commented Apr 23, 2021

xuyxu commented Apr 29, 2021

xuyxu commented Apr 20, 2021 •

edited

xuyxu commented Apr 21, 2021 •

edited

yunwezhang commented Apr 21, 2021 •

edited