Skip to content

ethan-homan/FlashTextSpark

 
 

Repository files navigation

FlashTextSpark

Introduces SparkKeywordProcessor which is a thin Scala wrapper around the FlashTextJava library done by jasonsperske. That project was a port of the flashtext.py into Java.

The motivation for this was to run FlashText on Spark to efficiently tag milliions of unstructured documents for matches against a large corpus of keywords (also in the millions).

Building

Just clone the repo an if you are on UNIX:

./gradlew build

or on windows:

./gradlew.bat build

This will bootstrap the project with all the dependencies, just requiring java 8 to be installed.

About

Spark wrapper around jasonsperske's Java port of flashtext.py

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 82.5%
  • Scala 17.5%