Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with multiple model instances using EMWithVE #740

Open
Kirluu opened this issue Apr 11, 2018 · 4 comments
Open

Dealing with multiple model instances using EMWithVE #740

Kirluu opened this issue Apr 11, 2018 · 4 comments

Comments

@Kirluu
Copy link

Kirluu commented Apr 11, 2018

Hi :)

We are doing our master's thesis at the IT University of Copenhagen, and we have a series of questions, that we hope there exists some useful answers for :)

We are working with a setup very similar to the spam-filter application case from chapter 3 in the "Practical Probabilistic Programming" book, and our questions regard the efficiency of learning for such a model. In essence, we have several model instances, which all reference some "shared" Beta-elements for learning, which in effect results in quite a large net of connected elements. We are looking to be able to perform the learning of our Beta-elements, but without having to evaluate the entire network of connected elements at once, but instead train and learn from each individual model instance one at a time instead.

Here are some more specific questions:

  • Why does EMWithVE use a completely different setup (ExpectationMaximizationWithFactors) compared to the other inference algorithms when used with EM? What are the optimizations / differences that apply here - and is there some litterature that you could point us to that would help us understand some of the differences?

  • If we attempt use GeneralizedEM with VE, it seems that that all active elements in the universe (thereby all our connected model instances) are passed as inputs to the inference algorithm. As the amount of model instances increases, this quickly becomes infeasible for an algorithm such as VE.
    If we consider the spam filter case from Chapter 3, would it not be possible to use the inference algorithm on each sample separately and then combine their results during the expectation step, rather than attempting to calculate the sufficient statistics for all model instances' elements all at once?
    We figured that this splitting-approach might be feasible with VE (if each individual model instance is not very complex), and also have the added benefit of being parallelizable (since each sample can be reasoned about separately) if we can use StructuredVE for the task.
    Is there a reason why this approach is not used? Is it not feasible? If it is possible, could you provide some pointers for how we can achieve this goal?

To bring about our perspective, we are trying to optimize our training-setup for our thesis work, such that an alteration to the probabilistic model will take a little time as possible to see the effect of - both in regards to training and of course evaluation. The setup with our model instances getting tangled into each other due to the shared Beta-elements seems to meddle with the efficiency of most inference algorithms in Figaro that are usable with EM. Is there some other approach that we could go with as an alternate setup?

As another note, we believe that we are able to build our model in such a fashion that we should have little to no hidden variables (namely 0 in the learning-case, and only a single one in the evaluation-phase), which should help the efficiency of whatever inference algorithm we end up with.
Also, according to litterature (https://ai.stanford.edu/~chuongdo/papers/em_tutorial.pdf), if one has no hidden variables, then you are in fact in the "complete data case", meaning that Maximum Likelihood Estimation should be feasible for the problem, namely the simple learning of the frequencies of one's dataset, rather than requiring the use of EM. Is there some way to access the MLE logic that is used as part of the EM-algorithm from somewhere in the source code?

Thanks a lot,
Hoping the best,
Best regards,
Christian and Kenneth,
Students at the IT University of Copenhagen, Denmark

@apfeffer
Copy link
Contributor

apfeffer commented Apr 17, 2018 via email

@Kirluu
Copy link
Author

Kirluu commented Apr 18, 2018

Thank you for your reply, @apfeffer

Indeed, EMWithVE Online seems to do the trick for us, which is of course very favorable for us at this moment.

However, we took the liberty of hardcoding the seeds for randomness for the EM-setup in our local clone of the Figaro repository, and it appears that running EMWithVE versus the Online version produces quite different results.
We suspect this to indeed be due to the exact same fact as the source of our concern, namely that the models being computed upon are different. (For EMWithVE, we get the one huge instance, and for Online, we get many smaller instances).

What still confuses us is that following the book's example on Online training to the letter, the same pattern - using ModelParameters - is used. This should still be delivering the exact same Beta-element to each of the instances created in the Online scenario. This must mean that there exists some logic in the Online setup that "handles" the case when learning-elements are a part of a model instance.

  • The question then becomes: How come this is not handled similarly in the regular EMWithVE case? Given that the book presents the pattern using ModelParameters and setting up many model-instances using these shared Beta-elements and then using the EMWithVE setup, surely this case should be handled there as well?

Maybe it is computationally infeasible to determine which elements are part of which model-instance, and if so, an explanation would be great, and otherwise, we'd simply like to know more, to better prepare ourselves for the potential questions regarding our usage of Figaro and the theory behind it.

Thank you in advance for any additional insights,
Best regards,
Christian and Kenneth

@apfeffer
Copy link
Contributor

apfeffer commented Apr 18, 2018 via email

@Kirluu
Copy link
Author

Kirluu commented Apr 18, 2018

Hi again @apfeffer ,

We have decided not to directly pursue the differences that we observed in the learned parameter values, as they have no major impact on the results of our project.

However, we'd still like to hear more on the question of how come Online EM is able to "cope" with the Beta(learning)-elements being shared amongst data-instances, while regular EMWithVE for instance cannot. For regular EM in Figaro, the time consumption scales not-so-well with added data, whereas Online EM indeed scales linearly - as we would expect from EM in general.

Is there some reasoning behind why the handling performed in Online is not possible for the OnePropQuery approach of the regular EM setup?

Thank you once again,
Regards,
Christian and Kenneth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants