-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ik_smart和ik_max_word分词差异怎么解决 #992
Comments
/**
问题就出现在组合词元中的数量词合并处理这块,为什么ik_max_word不进行数量词的合并呢?是有那方面的考量吗? |
@medcl 我也遇到同样的问题,理论上ik_smart应该为ik_max_word分词的子集 |
我也遇到这个问题,刚准备提issue,请问你解决了吗?一摸一样的问题 |
ik_smart和算法不一样,不一定是子集。 |
那就是说如果我使用 ik_smart 分词器搜索 ik_max_word 分词的数据,就不一定能搜索到。 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
ik_max_word分词效果:
`GET _analyze
{
"text": ["52周"],
"analyzer": "ik_max_word"
}
{
"tokens" : [
{
"token" : "52",
"start_offset" : 0,
"end_offset" : 2,
"type" : "ARABIC",
"position" : 0
},
{
"token" : "周",
"start_offset" : 2,
"end_offset" : 3,
"type" : "COUNT",
"position" : 1
}
]
}
`
ik_smart分词效果:
`GET _analyze
{
"text": ["52周"],
"analyzer": "ik_smart"
}
{
"tokens" : [
{
"token" : "52周",
"start_offset" : 0,
"end_offset" : 3,
"type" : "TYPE_CQUAN",
"position" : 0
}
]
}
`
问题是:ik_max_word 识别不出来TYPE_CQUAN类型的词,请问有解决方案没有?
The text was updated successfully, but these errors were encountered: