Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to compute any actual quantity of the coarse-grained kinetics with employing the fuzzy memberships? #279

Open
ShenWenHuibit opened this issue Aug 19, 2023 · 7 comments

Comments

@ShenWenHuibit
Copy link

ShenWenHuibit commented Aug 19, 2023

bayesian_hmm.gather_stats('transition_model/mfpt', A=[hmm.prior.metastable_sets[1]], B=hmm.prior.metastable_sets[1]).mean

After executing the above code it gives me error:
ValueError: Chosen set contains states that are not included in the active set.

I don't know how to choose active set, please help me, thanks.

@clonker
Copy link
Member

clonker commented Aug 19, 2023

It most likely means that due to the sampling you have produced a few hmm's who live on different states (or at least not a subset) of the prior's states. I suggest you subselect the samples accordingly. It might even be enough to call bhmm_largest = bhmm.submodel_largest()

@ShenWenHuibit
Copy link
Author

ShenWenHuibit commented Aug 19, 2023

It most likely means that due to the sampling you have produced a few hmm's who live on different states (or at least not a subset) of the prior's states. I suggest you subselect the samples accordingly. It might even be enough to call bhmm_largest = bhmm.submodel_largest()

Thank you very much for your prompt reply!
When I finished running the command bhmm_largest = bhmm.submodel_largest(connectivity_threshold=0.2, dtrajs=dtrajs) ,it still gave me the same error, and I noticed that the notes on metastable_sets in the manual read: "This is only recommended for visualization purposes. You cannot compute any actual quantity of the coarse-grained kinetics without employing the fuzzy memberships!"

`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[301], line 8
6 for i in range(nstates):
7 for j in range(nstates):
----> 8 mfpt[i, j] = bhmm_largest.evaluate_samples('transition_model/mfpt', A=bhmm_largest.prior.metastable_sets[i], B=bhmm_largest.prior.metastable_sets[j]).mean
9 # mfpt[i, j] = bhmm_largest.gather_stats('transition_model/mfpt', A=bhmm_largest.prior.metastable_sets[i], B=bhmm_largest.prior.metastable_sets[j]).mean
11 inverse_mfpt = np.zeros_like(mfpt)

File ~/whshen/anaconda3/envs/workshop/lib/python3.9/site-packages/deeptime/base.py:238, in BayesianModel.evaluate_samples(self, quantity, delimiter, *args, **kwargs)
218 r""" Obtains a quantity (like an attribute or result of a method or a property) from each of the samples.
219 Returns as list.
220
(...)
235 A list of the quantity evaluated on each of the samples. If can be converted to float ndarray then ndarray.
236 """
237 from deeptime.util.stats import evaluate_samples as _eval
--> 238 return _eval(self.samples, quantity=quantity, delimiter=delimiter, *args, **kwargs)

File ~/whshen/anaconda3/envs/workshop/lib/python3.9/site-packages/deeptime/util/stats.py:189, in evaluate_samples(samples, quantity, delimiter, *args, **kwargs)
187 samples = [call_member(s, q) for s in samples]
188 if quantity is not None:
--> 189 samples = [call_member(s, quantity, *args, **kwargs) for s in samples]
190 try:
191 samples = np.asfarray(samples)
...
602 """
603 if np.max(A) > self.n_states:
--> 604 raise ValueError('Chosen set contains states that are not included in the active set.')

ValueError: Chosen set contains states that are not included in the active set.I think the problem may be here, I can't directly usebhmm_largest.prior.metastable_sets[i]` to get MFPT.

To add, I use BayesianMSM to calculate MFPT on the same sample data, and I can get the result, but I don't know what is wrong with BayesianHMM.

@clonker
Copy link
Member

clonker commented Aug 19, 2023

Yeah so I looked into and I should have seen it sooner 😆

the transition model is the -coarse grained- space, so when you are looking at "metastable_sets", this is in finegrained space, ie clustering space! now you are asking to compute mfpt over clustering-space states (which there are presumably many more than in coarse grained space), and it rightfully complains about it.

Try the following: If you are computing MFPT between coarse-grained set 0 and 1, you can call

bayesian_hmm.gather_stats('transition_model/mfpt', A=[0], B=[1]).mean

This already takes care of the fuzzy state assignment, as you are not operating in fine-grained space.

@clonker
Copy link
Member

clonker commented Aug 19, 2023

By the way! We gave a workshop on deeptime/pyemma a while ago and also covered HMMs there. You can find the corresponding notebook here. Given a few free minutes I'll integrate it into the documentation here, I think. :)

https://github.com/markovmodel/pyemma-workshop/blob/master/notebooks/06-hmm.ipynb

@ShenWenHuibit
Copy link
Author

That's right, I understand now, but I have encountered another problem, please ask for your advice, it seems to be a problem with my data sample, the MFPT I calculated is too large. At the same time give me the following warning: LinAlgWarning: Ill-conditioned matrix (rcond=1.40001e-17): result may not be accurate. But I can get reasonable-looking results with BayesianMSM, which makes me very puzzled.

@clonker
Copy link
Member

clonker commented Aug 19, 2023

It could be that the transitions are not sampled well enough, but I am guessing here. Have you tried other combinations of metastable states? It could also be a problem with your clustering, projection method, featurization.... MSM and HMM estimation can be tricky. I suggest you methodically check everything and also score it with eg VAMP-2 score. And yeah, that rcond number does not inspire confidence. You could also look at your transition matrix (of the CG matrix) to see how it looks like. And visualizing the population of data frames onto a 2d projection of your data (don't forget to check all relevant projections that fall out of the projection method of your choice). Relevance is correlated to the singular (or eigen) value of the projection component.

@clonker
Copy link
Member

clonker commented Aug 19, 2023

And if you get reasonable results with bayesian msm you may have to tweak the estimation parameters a little when it comes to the bhmm. The prior is estimated with lower precision than the 'normal' bhmm. Also the mfpt of the prior itself would be interesting. is it also ill conditioned?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants