Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] unexpected exact query result with Chinese #4325

Open
authurguo opened this issue Jan 8, 2024 · 4 comments
Open

[BUG] unexpected exact query result with Chinese #4325

authurguo opened this issue Jan 8, 2024 · 4 comments
Assignees

Comments

@authurguo
Copy link

Describe the bug
There is no results when exact query Chinese word.

To Reproduce

FT.CREATE idx:test ON hash PREFIX 1 "test:" LANGUAGE "chinese" SCHEMA phrase TEXT tags TAG

HSET test:001 phrase "一个眼大" tags "开心" 
HSET test:002 phrase "一个眼大一个眼小" tags "悲伤"

FT.SEARCH idx:test "@phrase:一个眼大" LANGUAGE "chinese" # test:001 test:002
FT.SEARCH idx:test "@phrase:\"一个眼大\"" LANGUAGE "chinese" # NO RESULTS FOUND.

Expected behavior

FT.SEARCH idx:test "@phrase:\"一个眼大\"" LANGUAGE "chinese" 

Command returns test:001 and test:002.

Environment (please complete the following information):

  • docker / amd64
  • redis/redis-stack:7.2.0-v6

Additional context
default config without any customization.

@adrianoamaral
Copy link
Collaborator

@authurguo did you try using TAGs for exact matching queries?

I made a quick test, abusing of your index schema (adding the values in the tags field) and it works for me:

HSET test:003 phrase "一个眼大" tags "一个眼大"
HSET test:004 phrase "一个眼大" tags "一个眼大一个眼小"

FT.SEARCH idx:test "@tags:{一个眼大}" LANGUAGE "Chinese"

1) "1"
2) "test:003"
3) 1) "phrase"
   2) "\xe4\xb8\x80\xe4\xb8\xaa\xe7\x9c\xbc\xe5\xa4\xa7"
   3) "tags"
   4) "\xe4\xb8\x80\xe4\xb8\xaa\xe7\x9c\xbc\xe5\xa4\xa7"

FT.SEARCH idx:test "@tags:{一个眼大*}" LANGUAGE "Chinese"

1) "2"
2) "test:003"
3) 1) "phrase"
   2) "\xe4\xb8\x80\xe4\xb8\xaa\xe7\x9c\xbc\xe5\xa4\xa7"
   3) "tags"
   4) "\xe4\xb8\x80\xe4\xb8\xaa\xe7\x9c\xbc\xe5\xa4\xa7"
4) "test:004"
5) 1) "phrase"
   2) "\xe4\xb8\x80\xe4\xb8\xaa\xe7\x9c\xbc\xe5\xa4\xa7"
   3) "tags"
   4) "\xe4\xb8\x80\xe4\xb8\xaa\xe7\x9c\xbc\xe5\xa4\xa7\xe4\xb8\x80\xe4\xb8\xaa\xe7\x9c\xbc\xe5\xb0\x8f"

@authurguo
Copy link
Author

@authurguo did you try using TAGs for exact matching queries?

I made a quick test, abusing of your index schema (adding the values in the tags field) and it works for me:

HSET test:003 phrase "一个眼大" tags "一个眼大"
HSET test:004 phrase "一个眼大" tags "一个眼大一个眼小"

FT.SEARCH idx:test "@tags:{一个眼大}" LANGUAGE "Chinese"

1) "1"
2) "test:003"
3) 1) "phrase"
   2) "\xe4\xb8\x80\xe4\xb8\xaa\xe7\x9c\xbc\xe5\xa4\xa7"
   3) "tags"
   4) "\xe4\xb8\x80\xe4\xb8\xaa\xe7\x9c\xbc\xe5\xa4\xa7"

FT.SEARCH idx:test "@tags:{一个眼大*}" LANGUAGE "Chinese"

1) "2"
2) "test:003"
3) 1) "phrase"
   2) "\xe4\xb8\x80\xe4\xb8\xaa\xe7\x9c\xbc\xe5\xa4\xa7"
   3) "tags"
   4) "\xe4\xb8\x80\xe4\xb8\xaa\xe7\x9c\xbc\xe5\xa4\xa7"
4) "test:004"
5) 1) "phrase"
   2) "\xe4\xb8\x80\xe4\xb8\xaa\xe7\x9c\xbc\xe5\xa4\xa7"
   3) "tags"
   4) "\xe4\xb8\x80\xe4\xb8\xaa\xe7\x9c\xbc\xe5\xa4\xa7\xe4\xb8\x80\xe4\xb8\xaa\xe7\x9c\xbc\xe5\xb0\x8f"

Thanks!
I have tried exact matching by TAG, and it works.
I also want to confirm whether the exact matching by the Chinese TEXT is effective.

@adrianoamaral
Copy link
Collaborator

I still need to review it, but for TEXT it's tokenising using each characters as a term/word. Is that correct, considering the Chinese vocabulary? It means for each one of the characters 一, 个, 眼, 大 the index it's taking them as tokens. For matching this query using TEXT it requires to AND/INTERSECTION ( 一 AND 个 AND 眼 AND 大). TAG it's appropriate for the use cases where you need exact match as shared by you.

Copy link

This issue is stale because it has been open for 60 days with no activity.

@github-actions github-actions bot added the stale label Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants