Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

text2vec中,关于token与汉字字符换算 #145

Open
cutelitchi opened this issue Dec 26, 2023 · 1 comment
Open

text2vec中,关于token与汉字字符换算 #145

cutelitchi opened this issue Dec 26, 2023 · 1 comment
Labels
question Further information is requested

Comments

@cutelitchi
Copy link

模型中max_seq_length指的应该是模型能处理的最大token数,我想问下,这个模型中的token跟汉字字符是一个大概什么样比例的换算关系,我在一个博客上看到在text2vec上是1token约等于1.5个汉字,请问这个结论对吗?

@cutelitchi cutelitchi added the question Further information is requested label Dec 26, 2023
@shibing624
Copy link
Owner

是bert的token编码方式,1个token是1个汉字。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants