Skip to content

Java JNI wrapper for SentencePiece: unsupervised text tokenizer for Neural Network-based text generation.

License

Notifications You must be signed in to change notification settings

levyfan/sentencepiece-jni

Repository files navigation

SentencePiece Java Wrapper

Build

Java wrapper for SentencePiece with JNI. This module wraps sentencepiece::SentencePieceProcessor class with the following modifications:

  • Encode and Decode methods are re-defined as EncodeAsIds, EncodeAsPieces, DecodeIds and DecodePieces respectively.
  • SentencePieceText proto is not supported.

SentencePiece Version

v0.1.96

Build and Install SentencePiece

To build and install the Java wrapper from source, please try the following commands:

% mvn clean install

Using sentencepiece-jni as a dependency

Because the resulting JAR is platform-dependent, resolving this dependency is managed by the os-maven-plugin. Follow the instructions there to use this platform-dependent JAR.

Please note you need to have a C++ compiler and cmake installed.

Usage

See SentencePieceProcessorTest for more.