Skip to content

a cpp implementation of sparse biterm topic model, 10x faster than origin implementation because using sparse-sampler.

Notifications You must be signed in to change notification settings

kejunxiao/sparse_btm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

sparse_btm

a cpp implementation of sparse biterm topic model, 10x faster than origin implementation because using sparse-gibbs-sampler.

features:

  • being suitable to model for user-click-sequenece(Rcommandation System) or short-text(NLP), because it assume that adjacent N-items belong to a topic;
  • using sparse-gibbs-sampler, 10x faster than origin implementation;

arguments:


Biterm Topic Model (Sparse-Sampler)


Parameters:

  • -input
    path of docs file, lines of file look like "word1 word2 word3 ... \n"
  • -output
    dir of model(topic_biterm_sum, topic_word) file
  • -num_topics
    number of topics
  • -alpha
    symmetric doc-topic prior probability, default is 0.05
  • -beta
    symmetric topic-word prior probability, default is 0.01
  • -window_size
    window size for biterms, default is 2
  • -num_iters
    number of iteration, default is 20
  • -save_step
    save model every save_step iteration, default is -1 (no save)

usage:

./sparse_btm -input short_text.txt -output model_out/ -num_topics 100 -window_size 3 -num-iters 20 -save_step 10

About

a cpp implementation of sparse biterm topic model, 10x faster than origin implementation because using sparse-sampler.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published