Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When this query parser finds synonyms, it needs the longest match. #25

Open
jhsuh opened this issue Jun 27, 2013 · 5 comments
Open

When this query parser finds synonyms, it needs the longest match. #25

jhsuh opened this issue Jun 27, 2013 · 5 comments

Comments

@jhsuh
Copy link

jhsuh commented Jun 27, 2013

I insert the synonyms for dog just like below.
When I search "dog", I want to search "dog" or "man's best friend" or "dog(inc)" and it works perpectly.
When I search "dog(inc), I want to search "dog(inc)" or "dog" or "man's best friend" too.
But this query parser finds synonyms for "dog(inc)" and "dog" also(maybe uses the shortest match). And It searches ("doc" and "inc") or ("doc's synonyms" and "inc").

hmm....
I think the search query has to be the longest matched in the synonym_edismax query parser.

# tokenizer 
 query : StatndardTokenizer
 synonym : StatndardTokenizer 
 dog(inc) -> dog inc
# synynoyms.txt
 dog, man's best friend, dog(inc)
# search phrase 
 search query : dog ==> OK
http://127.0.0.1:8983/solr/select?qf=Title_t&q=dog&defType=synonym_edismax&synonyms=true&debugQuery=true&q.op=AND&synonyms.constructPhrases=true&synonyms.originalBoost=1.1&synonyms.synonymBoost=0.9
 ==> +((Title_t:dog)^1.1 (((+(Title_t:dog)) (+(Title_t:dog(inc))) (+(Title_t:"man's best friend")))^0.9))

 search query : dog(inc) ==> find dog's synonyms and make the AND search phrase with the dog's synonym and "inc". 
http://127.0.0.1:8983/solr/select?qf=Title_t&q=dog(inc)&defType=synonym_edismax&synonyms=true&debugQuery=true&q.op=AND&synonyms.constructPhrases=true&synonyms.originalBoost=1.1&synonyms.synonymBoost=0.9
 ==> +((((Title_t:dog) (Title_t:inc))~2^1.1) (((+(((Title_t:dog) (Title_t:inc))~2)) (+(((Title_t:"dog inc") (Title_t:inc))~2)) (+(((Title_t:"man's best friend") (Title_t:inc))~2)))^0.9))
@OkkeKlein
Copy link

I think this is because of the analyzer. Another example of not wanting the synonyms analyzed like a normal query.

@jhsuh
Copy link
Author

jhsuh commented Jun 27, 2013

But...
If I insert the synonyms "dog, man's best friend, dog inc" and search the query "dog inc", this query parser adds unexpected search phrase also just like "dog's synonym AND inc".
So I think this is not only because of the analyzer.
Thank you for your comment~ ^^

http://127.0.0.1:8983/solr/select?qf=Title_t&q=dog%20inc&defType=synonym_edismax&synonyms=true&debugQuery=true&synonyms.constructPhrases=true&synonyms.originalBoost=1.1&synonyms.synonymBoost=0.9&q.op=AND
+((((Title_t:dog) (Title_t:inc))~2^1.1) (((+(((Title_t:"dog inc") (Title_t:inc))~2)) (+(((Title_t:dog) (Title_t:inc))~2)) (+(((Title_t:"man's best friend") (Title_t:inc))~2)) (+(Title_t:"dog inc")) (+(Title_t:dog)) (+(Title_t:"man's best friend")))^0.9))

@OkkeKlein
Copy link

Looks to me that second query is parsed differently.

@jhsuh
Copy link
Author

jhsuh commented Jun 27, 2013

@OkkeKlein
Yes, but I think this is not needed "dog's synonym AND inc" phrase just like below.
"+(((Title_t:"dog inc") (Title_t:inc))~2)) (+(((Title_t:dog) (Title_t:inc))~2)) (+(((Title_t:"man's best friend") (Title_t:inc))~2))".

And I can't always use the WhitespaceTokenizer or KeywordTokenizer for query and synonym.

@nolanlawson
Copy link
Member

Sorry, but I'm struggling to understand the issue here. Could you write a unit test to demonstrate what's not functioning here? Just make a branch and modify the examples/example_synonym_file.txt and add a test under test/. Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants