Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ik_smart分词短词和长词问题 #979

Open
keepmoving1573 opened this issue Oct 14, 2022 · 2 comments
Open

ik_smart分词短词和长词问题 #979

keepmoving1573 opened this issue Oct 14, 2022 · 2 comments

Comments

@keepmoving1573
Copy link

自定义词典里,加了 "极速版2.0", "极速版"
但是搜 “极速版2.0”,拆成了极速版, 2.0. 如下
{ "tokens": [ { "token": "极速版", "start_offset": 0, "end_offset": 3, "type": "CN_WORD", "position": 0 }, { "token": "2.0", "start_offset": 3, "end_offset": 6, "type": "ARABIC", "position": 1 } ] }

请问,同时加了长短2个词, 在搜长词时,能不能不拆?

@keepmoving1573
Copy link
Author

我发现,中英文混合的扩展词,也有这个问题。
我同时维护了 SK模板, SK 两个扩展词
搜 SK模板,拆了,如下:
{
"tokens": [
{
"token": "sk",
"start_offset": 0,
"end_offset": 2,
"type": "ENGLISH",
"position": 0
},
{
"token": "模板",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 1
}
]
}

维护了2个词,包含英文, 怎么保证搜长词时,不拆?

@AnitaSherry
Copy link

遇到了相同的问题,还不如用结巴

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants