Skip to content

Version 2.1.0

Compare
Choose a tag to compare
@jbaker-dstl jbaker-dstl released this 11 Dec 12:18
· 185 commits to master since this release

This version includes the following improvements:

  • New Annotator: MongoStemming uses a gazetteer and stemming to perform
    a pseudo-fuzzy match and find gazetter terms in different tenses and
    plurals
  • New Cleaner: MergeAdjacent will merge adjacent entities of the same
    type
  • New Content Extractor: CsvContentExtractor splits CSV fields into
    content and metadata
  • New Collection Reader: LineReader will read a single file into
    multiple documents by line
  • New REST API to get configuration parameters for components (e.g.
    annotators)
  • Significant changes to the way gazetteer annotators work, including
    changing from RadixTrees to MultiMaps and implementing the Aho-Corasick
    algorithm, resulting in performance improvements for large gazetteers in
    the order of 100s
  • Lots of bug fixes and minor improvements