-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Content Search: use AND operator by default and don't apply mask for phrases #6979
Merged
Merged
Changes from all commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
0b2ce57
Content Search: use AND operator by default and don't apply mask for …
yurabakhtin fcbe691
Fix test
yurabakhtin 4f92d84
Fix SearchQueryTest
yurabakhtin a74a574
Merge branch 'develop' into enh/230-content-search-default-operator
yurabakhtin c233527
Merge branch 'develop' into enh/230-content-search-default-operator
yurabakhtin 06c6fe0
Fix creating of FULLTEXT index for mysql 8+
yurabakhtin c75eee8
Merge branch 'develop' into enh/230-content-search-default-operator
yurabakhtin 92e4881
Fix mysql search driver for short words
yurabakhtin 4fc9d98
Improve keyword highlighting
yurabakhtin 30379f1
Use fixed value instead of `mysql.ft_min_word_len`
yurabakhtin ce85d0c
Exclude highlighting on console request
yurabakhtin 308e400
Escape special char `@` on mysql search driver
yurabakhtin ff2705f
Merge branch 'develop' into enh/230-content-search-default-operator
yurabakhtin 7aed7d6
Escape special chars on mysql search driver
yurabakhtin 3673239
Escape char `/` on mysql search driver
yurabakhtin b37078d
Escape quote chars on mysql search driver
yurabakhtin bda6c61
Improve word highlighting with quote or apostrophe
yurabakhtin a6bcf94
Merge branch 'develop' into enh/230-content-search-default-operator
yurabakhtin 9cec0c6
Add code comment for search query and driver
yurabakhtin 9faa03a
Merge branch 'develop' into enh/230-content-search-default-operator
yurabakhtin b850754
Refactor search query
yurabakhtin 5bdb059
Fix test for ZendLucence Driver
yurabakhtin 2d165ae
Fix ZendLucence Driver for phrases
yurabakhtin f21605e
Merge branch 'develop' into enh/230-content-search-default-operator
luke- File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurabakhtin Is there a reason why you removed the wildcards?
Basically, the search for “Apple” should also find “Applepie”.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luke- I have moved the wildcards from drivers(mysql, zend, solr) to
SearchQuery
code here, because the chars"
are removed on theSearchQuery
side, so we cannot know what original term was really when we are on the driver side.This is how it worked before the change:
Apple
=>Apple*
(findApplepie
)"Apple"
=>Apple*
(findApplepie
)After the change:
Apple
=>Apple*
(findApplepie
)"Apple"
=>Apple
(find onlyApple
, noApplepie
)As I understand it should work as it is described in the first table here.
Tests for quoted cases are here:
I have done the change for the request "Phrases should actually work. Can you check what is causing the problem here and add tests if necessary?", please correct me if I misunderstood it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurabakhtin My intention was for “SearchQuery” to be as neutral a representation of the search query as possible. Without any syntax like wildcards or terms in quotation marks.
e.g. Asterisk or Quotation marks is more of a driver-specific syntax. (Even if the similar for MySQL and Lucene)
But I understand that, for example, with
SearchRequest::andTerms
we cannot distinguish whether it was originally quoted or not.Do you have any ideas?
We could introduce new attributes such as
andParts
(with Wildcard) andandTerms
(with Quotationmarks).We could define that as soon as a space is contained in e.g.
andTerms
, it is seen as a term, otherwise it is provided with a wildcard. (But then“Apple”
always becomesApple*
.?
Or did I get something wrong here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luke-
Ok, I have understood your intention.
For solve it we could implement new array as
andParts
or extend the existing array with new array item to mark each term like this:so on driver side we will know what term should be masked with
*
.But I am thinking why should we keep an additional array
andParts
or use the complex structure if we can use a last char of each term as flag. I.e. if the last char is*
then we can decide such term should be processed with specific way on a driver side.For current drivers(mysql, zend, solr) we can keep the code as is, but for future drivers the last char
*
can be used as flag like this, if the driver doesn't support terms ended with*
:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurabakhtin Okay, I got your point.
Do I understand it correctly that now e.g. in
SearchQuery::andTerms
, can contain entries that are either quoted and contain an asterisk. Or is only the asterisk included?Your approach is probably the right one. Since in fact all current drivers support this and it also corresponds to the query language of
SearchQuery
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luke- Yes, but a bit correction about "quoted":
Examples how a requested keyword is stored in terms:
Apple
=>Apple*
"Apple"
=>Apple
Foo bar
=>Foo bar*
"Foo bar"
=>Foo bar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurabakhtin Can you please add this example somewhere in the
SearchQuery
PHPDoc?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luke- 9cec0c6#diff-70f82fb3d09d4959fb9f44a526f8455144c112df88ec7299a2efe28a76171169R28-R64