Skip to content

danny50610/bpe-tokeniser

Repository files navigation

bpe-tokeniser

PHP Test codecov Latest Stable Version Total Downloads License

PHP port for openai/tiktoken (most)

Supported encodings

  • gpt-3.5-turbo
  • gpt-4
  • gpt-4o
  • more ...

For available encodings, see src/EncodingFactory.php

Installation

composer require danny50610/bpe-tokeniser

Example

GPT-4 / GPT-3.5-Turbo (cl100k_base)

use Danny50610\BpeTokeniser\EncodingFactory;

$enc = EncodingFactory::createByEncodingName('cl100k_base');

var_dump($enc->encode("hello world"));
/**
 * output: 
 * array(2) {
 *  [0]=>
 *  int(15339)
 *  [1]=>
 *  int(1917)
 * }
 */

var_dump($enc->decode($enc->encode("hello world")));
// output: string(11) "hello world"
use Danny50610\BpeTokeniser\EncodingFactory;

$enc = EncodingFactory::createByModelName('gpt-3.5-turbo');

var_dump($enc->decode($enc->encode("hello world")));
// output: string(11) "hello world"