Skip to content

e2fyi/snowball

 
 

Repository files navigation

ElasticMeow.stemmer.snowball

This is a fork of the original snowball repo where the snowball stemmer (english only) is compiled into Javascript asm with Emscripten for use in the browser.

You can see a demo at https://e2fyi.github.io/snowball/.

Bower

For purely front-end work, you can use bower to get the js file.

bower install --save e2fyi/Snowball

QuickStart

The main js file is located at /emscripten/build/snowball_en.js. You will need to load this script and run the module Snowball.

<script src="./emscripten/build/snowball_en.js"></script>
// init the module
Snowball();

The stemmer is available under the function ElasticMeow.stemmer.snowball. In order to reduce overheads, the stemmer can take in either a space separated string of words or an array of words, instead of a single word.

The stemmer return an output of the same type as the input (i.e. String[] will return String[], String will return String).

Space-separated strings

// stem a whitespace separated string .
// return a string of whitespace separated string.
var stemmed = ElasticMeow.stemmer.snowball('happy going swimming');
// outputs 'happi go swim'
console.log(stemmed);

Array of strings

// stem an array of words and return an array of stemmed words.
var stemmed = ElasticMeow.stemmer.snowball(['happy', 'going', 'swimming']);
// outputs ['happi', 'go', 'swim']
console.log(stemmed);

Build

You will need to install Emscripten and all the dependencies.

Then you need to make libstemmer and various source files.

make

Then to compile the module.

emcc emscripten/snowball_en.c runtime/api.c runtime/utilities.c libstemmer/libstemmer.c src_c/stem_ISO_8859_1_english.c src_c/stem_UTF_8_english.c  --post-js emscripten/snowball.js --memory-init-file 0 -o emscripten/build/snowball_en.js -s MODULARIZE=1 -s EXPORT_NAME="'Snowball'" -O3

or

npm run build

Links

http://snowballstem.org http://emscripten.org

Releases

No releases published

Packages

No packages published

Languages

  • C 75.5%
  • Makefile 4.9%
  • JavaScript 4.5%
  • Rust 4.1%
  • Python 4.1%
  • Java 2.8%
  • Other 4.1%