Skip to content

corajr/topic-annotator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Topic Annotator

This Scala library can perform common preprocessing tasks on a corpus, and then run it through one of several topic models (implemented using Gibbs sampling) to produce an annotated output. It is not yet ready for general use, but should help to simplify the format wrangling needed to test different topic models on a corpus.

Check out the org.chrisjr.topic_annotator.App class or the various tests for sample usage.

Preprocessing options:

  • regex tokenization
  • lowercasing
  • TF-IDF filtering
  • stoplists

Topic models:

About

Performs common preprocessing steps for topic modeling and allows for annotation of the original documents with the results.

Resources

Stars

Watchers

Forks

Packages

No packages published