Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

这个库在chat模型的token计算上有诸多错误 #3

Open
CodeInDreams opened this issue Dec 11, 2023 · 4 comments
Open

这个库在chat模型的token计算上有诸多错误 #3

CodeInDreams opened this issue Dec 11, 2023 · 4 comments

Comments

@CodeInDreams
Copy link

与openai返回的token对比发现,几乎各chat模型都有计算方式错误或结果偏差,于是我自己从零建模和编写了token计算工具

你的部分token计算代码有严重错误,这里列举部分:

  • vision base64读取
  • vision图片缩放
  • function call、functions
  • tool call、tools
  • 0301/0314与0613版本计算差异
  • 多个参数(model、functions)组合使用对token计算的影响
  • 应使用encodeOrdinary来跳过special tokens

由于精力有限我无法在开源代码上提交修改,本issue只是告知绝大部分token计算都有误,请你自己有精力时研究下

@forestwanglin
Copy link
Owner

forestwanglin commented Dec 11, 2023

感谢,的确不能保证和官方返回一模一样。

可以看到官方的cookbookHow_to_count_tokens_with_tiktoken中也是提到

image

也是一个预估值,这个计算方法主要是用来预估发送的一些limit,所以有一点误差不会影响逻辑。

@forestwanglin
Copy link
Owner

token计算不支持 gpt-3.5-turbo-0301

@forestwanglin
Copy link
Owner

我按照官方Demo的计算规则,更新了计算方法。
已经测试的:

  • gpt-3.5-turbo(gpt-3.5-turbo-0613/gpt-3.5-turbo-16k-0613)
  • gpt-3.5-turbo-1106
  • gpt-4-1106-preview
  • gpt-4(gpt-4-0613/gpt-4-32k-0613)

更新了funciton的计算规则,具体可参考FunctionFormat

@forestwanglin
Copy link
Owner

forestwanglin commented Jan 30, 2024

More issue about token calculation can be found issue #4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants