Special character word segmentation #4367

pastoralz · 2024-01-17T06:40:38Z

Hello.
I am a developer from China, and I found a problem when I used redisSearch. Suppose I have a code value of 1002183$1$0$0, and I set up an index on this code. When I use search to query 1022183$1$0$0, I cannot get the desired result. I looked through the submitted questions and found a similar problem, hello/-world will be divided into multiple words, so I thought that when there are special characters in my code, it will be divided into multiple words, I can get accurate data by replacing the special characters with Spaces, but there will be a problem. Suppose the replacement is 1022183 10 0, at this time I have another code replaced with 1022183 0 1, then two pieces of data will be queried, it seems that it is impossible to achieve accurate special character query, may I ask if there is a solution, if there is a reply looking forward to, thank you.

oshadmi · 2024-01-17T08:10:19Z

@pastoralz Thank you for reporting this.

If you want to avoid dividing into multiple words, you can use a TAG attribute, and escape the special character when querying, for example,

HSET doc:1 foo 1002183$1$0$0
FT.CREATE idx_tag schema foo TAG 
FT.SEARCH idx_tag '@foo:{1002183\$1\$0\$0}'

If you want to use TEXT, you can either escape to avoid dividing into multiple words,
for example,

hset doc:2 foo 1002183\$1\$0\$0
ft.create idx_txt schema foo TEXT 
ft.search idx_txt '@foo:(1002183\$1\$0\$0)'

Or allow to split and then search by multiple words, but not sure this is what you are looking for (since it could match more documents),
for example,

hset doc:3 foo 1002183$1$0$0
ft.create idx_txt schema foo TEXT 
ft.search idx_txt '@foo:(1002183 1 0)'

pastoralz · 2024-01-17T08:57:50Z

Thank you for your reply, but I wonder why the data saving, modification and query in rejson format are not automatically processed in the framework, so as to reduce the difficulty of use and focus more on the development at the business level

github-actions · 2024-03-18T01:46:47Z

This issue is stale because it has been open for 60 days with no activity.

pastoralz added the feature label Jan 17, 2024

pastoralz assigned adrianoamaral Jan 17, 2024

github-actions bot added the stale label Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Special character word segmentation #4367

Special character word segmentation #4367

pastoralz commented Jan 17, 2024

oshadmi commented Jan 17, 2024

pastoralz commented Jan 17, 2024

github-actions bot commented Mar 18, 2024

Special character word segmentation #4367

Special character word segmentation #4367

Comments

pastoralz commented Jan 17, 2024

oshadmi commented Jan 17, 2024

pastoralz commented Jan 17, 2024

github-actions bot commented Mar 18, 2024