Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special character word segmentation #4367

Open
pastoralz opened this issue Jan 17, 2024 · 3 comments
Open

Special character word segmentation #4367

pastoralz opened this issue Jan 17, 2024 · 3 comments
Assignees

Comments

@pastoralz
Copy link

Hello.
I am a developer from China, and I found a problem when I used redisSearch. Suppose I have a code value of 1002183$1$0$0, and I set up an index on this code. When I use search to query 1022183$1$0$0, I cannot get the desired result. I looked through the submitted questions and found a similar problem, hello/-world will be divided into multiple words, so I thought that when there are special characters in my code, it will be divided into multiple words, I can get accurate data by replacing the special characters with Spaces, but there will be a problem. Suppose the replacement is 1022183 10 0, at this time I have another code replaced with 1022183 0 1, then two pieces of data will be queried, it seems that it is impossible to achieve accurate special character query, may I ask if there is a solution, if there is a reply looking forward to, thank you.

@oshadmi
Copy link
Collaborator

oshadmi commented Jan 17, 2024

@pastoralz Thank you for reporting this.

If you want to avoid dividing into multiple words, you can use a TAG attribute, and escape the special character when querying, for example,

HSET doc:1 foo 1002183$1$0$0
FT.CREATE idx_tag schema foo TAG 
FT.SEARCH idx_tag '@foo:{1002183\$1\$0\$0}'

If you want to use TEXT, you can either escape to avoid dividing into multiple words,
for example,

hset doc:2 foo 1002183\$1\$0\$0
ft.create idx_txt schema foo TEXT 
ft.search idx_txt '@foo:(1002183\$1\$0\$0)'

Or allow to split and then search by multiple words, but not sure this is what you are looking for (since it could match more documents),
for example,

hset doc:3 foo 1002183$1$0$0
ft.create idx_txt schema foo TEXT 
ft.search idx_txt '@foo:(1002183 1 0)'

@pastoralz
Copy link
Author

Thank you for your reply, but I wonder why the data saving, modification and query in rejson format are not automatically processed in the framework, so as to reduce the difficulty of use and focus more on the development at the business level

Copy link

This issue is stale because it has been open for 60 days with no activity.

@github-actions github-actions bot added the stale label Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants