Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The chunksize variable specified in ldaseqmodel is not passed to ldamodel, so latter defaults to chunksize = 2000 #3472

Open
mspezio opened this issue May 16, 2023 · 0 comments

Comments

@mspezio
Copy link

mspezio commented May 16, 2023

Looking into ldaseqmodel.py, see that chunksize specified is not passed to ldamodel:

"if corpus is not None and time_slice is not None:
self.max_doc_len = max(len(line) for line in corpus)

        if initialize == 'gensim':
            lda_model = ldamodel.LdaModel(
                corpus, id2word=self.id2word, num_topics=self.num_topics,
                passes=passes, alpha=self.alphas, random_state=random_state,
                dtype=np.float64
            )"

This may cause suboptimal topics due to the default chunksize = 2000 being too small for applications that have many documents.

Could this be fixed in the next release?

Great package, thanks so much for sharing it and all of the work that has gone into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant