Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log-likelihood for discrete Weibull distribution #58

Open
michael-tsel opened this issue May 9, 2019 · 3 comments
Open

Log-likelihood for discrete Weibull distribution #58

michael-tsel opened this issue May 9, 2019 · 3 comments

Comments

@michael-tsel
Copy link

michael-tsel commented May 9, 2019

There seems to be an issue with log-likelihood for discrete Weibull distribution with censored data (u=0).
According to equation (2.7) in Proposition 2.26 of your great thesis, the likelihood in this case is
L_d = Pr(T_d > t) = Pr(T >= t+1) for t in {0,1,2,...}
However, I do believe that it should be
L_d = Pr(T_d >= t) = Pr(T >= t) for t in {0,1,2,...}
[Sorry, I have found no way to use TeX here]

Arguments are following. Assume u=0 and tte=0 for some fixed day. It means that the next event might occur at any day after that fixed day, so the probability should be equal to 1. In your case it's strictly lower than 1.

@ragulpr
Copy link
Owner

ragulpr commented Oct 13, 2019

Hi there and sorry for the slow response. Very happy for the contribution and I'm surprised and impressed that someone took their time to think about this issue too, because I sure did.

I was worried that my definition may cause off-by one errors and confusion, but after trying out the alternative it caused more confusion for me in the long run when alternating between discrete/continuous time.

Assume u=0 and tte=0 for some fixed day. It means that the next event might occur at any day after that fixed day, so the probability should be equal to 1. In your case it's strictly lower than 1.

I read your interpretation as us disagreeing on how to index intervals - zero or 1 based.
I thought long and hard about this and decided on the perspective of considering discretized time as indexed intervals, I chose the first interval index to be 0. I don't think there's a right or wrong here, just a matter of taste. But here's some arguments:

  • Numpy is zero-indexed (sorry R-using friends)
  • "today" = t_d=0 i.e rather than "day 1 is today". No true answer here, one could argue that "1st day" being t=0 is pretty confusing. But in a zero-based indexing framework having both definitions would be more confusing still.
  • If u=0 and (discrete) t_d=0 and we define (discrete time) t_d=0 as "event may occur at anytime, the 1st day or anytime after", that is indeed a sure event (p=1) so such an observation gives zero information. So there's really no reason for having such an observation in the dataset.
  • If u=0 and (discrete) t_d=0 in my framework I interpret this as "discrete time is greater than 0", which is the same thing as saying "continuous time is greater than 1".
  • This whole saga begins with me enjoying thinking about time intervals as right-open/cadlag i.e t∈[t_d,t_d+1), maybe because I come from a place where todays batch-jobs taking in yesterdays data were scheduled to run at 00.00

To me this seemed to make more sense but it's really a matter of taste. Again, thanks for the kind words.

@ragulpr
Copy link
Owner

ragulpr commented Oct 13, 2019

Also, see my comments on #59

@michael-tsel
Copy link
Author

So, if I understand you correctly, then for non-censored data (y=1) and t_d=0 you calculate likelihood as a probability of continuous t to get into [0,1),
while for censored data ( y=0) and t_d=0 you calculate likelihood as a probability of continuous t to get into [1,+\infty).
This makes things clear. However, one should read a reference carefully before feeding his data into WTTE-RNN framework.
I feel that the issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants