Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standard errors for HMM parameters #61

Open
tbeason opened this issue Apr 26, 2022 · 3 comments
Open

standard errors for HMM parameters #61

tbeason opened this issue Apr 26, 2022 · 3 comments

Comments

@tbeason
Copy link

tbeason commented Apr 26, 2022

I see that it is possible get standard errors for covariates in MHMM models.

I am wondering if it is possible to get standard errors for the transition probabilities and emission probabilities in simple HMM models? Or even some type of confidence interval?

@helske
Copy link
Owner

helske commented May 2, 2022

In theory, you could get some asymptotic standard error estimates from the Hessian used in the numerical optimization (local_step = TRUE). However, this is at least currently not supported and would need some work as the nloptr used for the numerical optimization does not return the Hessian. Of course, "manually" estimating the model with optim and logLik functions with hessian=TRUE is possible.

If you have a reasonable amount of sequences, you could compute nonparametric bootstrap estimates though.

@tbeason
Copy link
Author

tbeason commented May 9, 2022

I do have a lot of sequences. In the lowest case, around 50,000. In the largest case, around 1 million. How would I go about doing the bootstrap? I am familiar with bootstrap methods in general, but this is my first trip into HMM methods so I am open to suggestions. My first thought would be to randomly select entire sequences (with replacement), so that the original sequence remains intact but the sample composition becomes random.

Sidenote: The parallelization works well! Estimation is not too slow even with a large number of sequences when using 64 cores. Thanks for that!

@helske
Copy link
Owner

helske commented May 10, 2022

Yes, your strategy of sampling randomly entire sequences sounds right. In order to avoid potential issues with multiple (local) optima (as well as in order to speed the bootstrap), I suggest you use the estimated parameters as initial values in the bootstrap loop, i.e. you have your estimated model based on the original data, say mod, and then you define boot_model <- build_hmm(boot_sequences, transition_probs = mod$transition_probs, emission_probs = mod$emission_probs, initial_probs = mod$initial_probs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants