Skip to content

Latest commit

 

History

History
16 lines (12 loc) · 885 Bytes

README.md

File metadata and controls

16 lines (12 loc) · 885 Bytes

Topic Annotator

This Scala library can perform common preprocessing tasks on a corpus, and then run it through one of several topic models (implemented using Gibbs sampling) to produce an annotated output. It is not yet ready for general use, but should help to simplify the format wrangling needed to test different topic models on a corpus.

Check out the org.chrisjr.topic_annotator.App class or the various tests for sample usage.

Preprocessing options:

  • regex tokenization
  • lowercasing
  • TF-IDF filtering
  • stoplists

Topic models: