Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

token count is inconsistent with OpenAI tokenizer #17

Open
GorvGoyl opened this issue Nov 21, 2023 · 1 comment
Open

token count is inconsistent with OpenAI tokenizer #17

GorvGoyl opened this issue Nov 21, 2023 · 1 comment

Comments

@GorvGoyl
Copy link

As shown below:

screenshot 2023-11-21 at 10 41 58@2x

screenshot 2023-11-21 at 10 42 20@2x

text:

<|im_start|>dd<|im_sep|>OpenAI's large language models (sometimes referred to as GPT's) process text using tokens, which are common sequences of characters found in a set of text. The models learn to understand the statistical relationships between these tokens, and excel at producing the next token in a sequence of tokens.<|im_end|><|im_start|>assistant<|im_sep|><|im_end|><|im_start|>assistant<|im_sep|>
@syntaxtrash
Copy link

any update to this? They work fine without the special characters.

https://platform.openai.com/tokenizer
image

https://tiktokenizer.vercel.app/
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants