You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems you are asking about the EM/VBEM algorithm in general. I would say your description is almost right, except this part:
Then using p_ij s , we put f_j to transcripts with highest probability , say t_jm , we do this for all fragments, so for each fragment we have assigned unique transcript. So we can have new eta, call it eta_1 ( because we have abundance)
Specifically, the EM algorithm does not do "hard" assignment. That is, at no point during the algorithm, is a fragment fully assigned to a specific transcript (unless it was uniquely mapped there). Rather, the EM algorithm performs "soft" assignment. So, consider we have a fragment $f_j$ that maps to two transcripts $t_{j1}$ and $t_{j2}$. The EM algorithm will "partially" allocate this fragment to each of the transcripts. Specifically, it will allocate them proportional to $P(f_j \in t_{j1}) \propto P(t_{j1} \mid \eta) P(f_j \mid t_{j1})$ and $P(f_j \in t_{j2}) \propto P(t_{j2} \mid \eta) P(f_j \mid t_{j2})$ respectively. Then, in the "M" phase of the EM algorithm, one calculates the total mass arising from a transcript $t_i$ as $\sum_{f_j \text{ such that } f_j \text{ maps to } t_i} P(f_j \in t_i)$ (one sum). Computing these abundances for all $t_i$ gives us our next estimate of $\eta$, and then we can go back and re-compute the probabilities $P(f_j \in t_{j1})$ etc. This is done until convergence.
Hi @Ray6283,
It seems you are asking about the EM/VBEM algorithm in general. I would say your description is almost right, except this part:
Specifically, the EM algorithm does not do "hard" assignment. That is, at no point during the algorithm, is a fragment fully assigned to a specific transcript (unless it was uniquely mapped there). Rather, the EM algorithm performs "soft" assignment. So, consider we have a fragment$f_j$ that maps to two transcripts $t_{j1}$ and $t_{j2}$ . The EM algorithm will "partially" allocate this fragment to each of the transcripts. Specifically, it will allocate them proportional to $P(f_j \in t_{j1}) \propto P(t_{j1} \mid \eta) P(f_j \mid t_{j1})$ and $P(f_j \in t_{j2}) \propto P(t_{j2} \mid \eta) P(f_j \mid t_{j2})$ respectively. Then, in the "M" phase of the EM algorithm, one calculates the total mass arising from a transcript $t_i$ as $\sum_{f_j \text{ such that } f_j \text{ maps to } t_i} P(f_j \in t_i)$ (one sum). Computing these abundances for all $t_i$ gives us our next estimate of $\eta$ , and then we can go back and re-compute the probabilities $P(f_j \in t_{j1})$ etc. This is done until convergence.
Originally posted by @rob-p in #889 (reply in thread)
The text was updated successfully, but these errors were encountered: