Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing Doc Topic Distribution with LDA model #255

Open
mmantyla opened this issue May 1, 2018 · 3 comments
Open

Storing Doc Topic Distribution with LDA model #255

mmantyla opened this issue May 1, 2018 · 3 comments

Comments

@mmantyla
Copy link

mmantyla commented May 1, 2018

This is mostly annoyance. I think it would be logical if the lda_model would also store the resulting doc_topic_distr as part of the public fields.

doc_topic_distr = lda_model$fit_transform(x = dtm, n_iter = 1000, convergence_tol = 0.001, n_check_convergence = 25, progressbar = FALSE)

We can see that topic_word_distribution is already there so having doc_topic_distribution would make sense as well. Or have I misunderstood something.

> lda_model <WarpLDA> Inherits from: <LDA> Public: clone: function (deep = FALSE) components: active binding fit_transform: function (x, n_iter = 1000, convergence_tol = 0.001, n_check_convergence = 10, get_top_words: function (n = 10, topic_number = 1L:private$n_topics, lambda = 1) initialize: function (n_topics = 10L, doc_topic_prior = 50/n_topics, topic_word_prior = 1/n_topics) plot: function (lambda.step = 0.1, reorder.topics = FALSE, doc_len = private$doc_len, topic_word_distribution: active binding transform: function (x, n_iter = 1000, convergence_tol = 0.001, n_check_convergence = 5,

@dselivanov
Copy link
Owner

topic_word_distribution can be considered as "fixed" after model fitted. doc_topic_distr however depends on the input data and will different during inference.

@mmantyla
Copy link
Author

mmantyla commented May 2, 2018

Sure. In my course only one run is done after which the model is saved for further analysis. However, several models from different data set are done but all with one run. Now saving each of them requires that two different objects are saved. With topicmodels package saving one model was enough.

@dselivanov
Copy link
Owner

dselivanov commented May 3, 2018

I will make it optional. Now we store it internally anyway (but this is not desirable because serialized model is huge).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants