New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Content Search: use AND operator by default and don't apply mask for phrases #6979
base: develop
Are you sure you want to change the base?
Conversation
@@ -129,10 +129,10 @@ private function createMysqlFullTextQuery(SearchQuery $query, array $matchFields | |||
$againstQuery = ''; | |||
|
|||
foreach ($query->andTerms as $keyword) { | |||
$againstQuery .= '+' . rtrim($keyword, '*') . '* '; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurabakhtin Is there a reason why you removed the wildcards?
Basically, the search for “Apple” should also find “Applepie”.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luke- I have moved the wildcards from drivers(mysql, zend, solr) to SearchQuery
code here, because the chars "
are removed on the SearchQuery
side, so we cannot know what original term was really when we are on the driver side.
This is how it worked before the change:
Apple
=>Apple*
(findApplepie
)"Apple"
=>Apple*
(findApplepie
)
After the change:
Apple
=>Apple*
(findApplepie
)"Apple"
=>Apple
(find onlyApple
, noApplepie
)
As I understand it should work as it is described in the first table here.
Tests for quoted cases are here:
- https://github.com/humhub/humhub/pull/6979/files#diff-097b244c3690843dabcdabfa0289d8a360bb013a8e05289e2e27d68301547962R51-R52
- https://github.com/humhub/humhub/pull/6979/files#diff-6d94b066e5467f549cd0cc6b834e574e7f82da4d61d4581aeaec7da9b3f9d572R19-R22
- https://github.com/humhub/humhub/pull/6979/files#diff-6d94b066e5467f549cd0cc6b834e574e7f82da4d61d4581aeaec7da9b3f9d572R108
I have done the change for the request "Phrases should actually work. Can you check what is causing the problem here and add tests if necessary?", please correct me if I misunderstood it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurabakhtin My intention was for “SearchQuery” to be as neutral a representation of the search query as possible. Without any syntax like wildcards or terms in quotation marks.
e.g. Asterisk or Quotation marks is more of a driver-specific syntax. (Even if the similar for MySQL and Lucene)
But I understand that, for example, with SearchRequest::andTerms
we cannot distinguish whether it was originally quoted or not.
Do you have any ideas?
-
We could introduce new attributes such as
andParts
(with Wildcard) andandTerms
(with Quotationmarks). -
We could define that as soon as a space is contained in e.g.
andTerms
, it is seen as a term, otherwise it is provided with a wildcard. (But then“Apple”
always becomesApple*
. -
?
Or did I get something wrong here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention was for “SearchQuery” to be as neutral a representation of the search query as possible. Without any syntax like wildcards or terms in quotation marks.
e.g. Asterisk or Quotation marks is more of a driver-specific syntax. (Even if the similar for MySQL and Lucene)
Ok, I have understood your intention.
For solve it we could implement new array as andParts
or extend the existing array with new array item to mark each term like this:
SearchQuery->andTerms = [
['Apple', 'mask' => false],
['Pine', 'mask' => true],
];
so on driver side we will know what term should be masked with *
.
But I am thinking why should we keep an additional array andParts
or use the complex structure if we can use a last char of each term as flag. I.e. if the last char is *
then we can decide such term should be processed with specific way on a driver side.
For current drivers(mysql, zend, solr) we can keep the code as is, but for future drivers the last char *
can be used as flag like this, if the driver doesn't support terms ended with *
:
foreach ($request->getSearchQuery()->andTerms as $term) {
$keywordQuery->addSubquery(str_ends_with($term, '*') ? $this->prepareMaskedTerm($term) : $term);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurabakhtin Okay, I got your point.
Do I understand it correctly that now e.g. in SearchQuery::andTerms
, can contain entries that are either quoted and contain an asterisk. Or is only the asterisk included?
Your approach is probably the right one. Since in fact all current drivers support this and it also corresponds to the query language of SearchQuery
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand it correctly that now e.g. in
SearchQuery::andTerms
, can contain entries that are either quoted and contain an asterisk. Or is only the asterisk included?
@luke- Yes, but a bit correction about "quoted":
- entries that are either simple word without quotes or contain an asterisk
Examples how a requested keyword is stored in terms:
Apple
=>Apple*
"Apple"
=>Apple
Foo bar
=>Foo bar*
"Foo bar"
=>Foo bar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurabakhtin Can you please add this example somewhere in the SearchQuery
PHPDoc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurabakhtin Can you please add this example somewhere in the
SearchQuery
PHPDoc?
@luke- 9cec0c6#diff-70f82fb3d09d4959fb9f44a526f8455144c112df88ec7299a2efe28a76171169R28-R64
@yurabakhtin Very sorry, comment #6552 (comment) and the followings are in the wrong issue... However, thanks for the fix with the FYI, there is still an error when typing the |
@marc-farre Thanks, fixed in the commit 7aed7d6. I have also tested all special chars from my keyboard and found the same problem with chars |
@yurabakhtin Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurabakhtin Thanks looks good. Can you please check my two comments?
Are customization tests also necessary for the Solr class?
@@ -145,6 +158,11 @@ private function createMysqlFullTextQuery(SearchQuery $query, array $matchFields | |||
); | |||
} | |||
|
|||
protected function prepareKeyword(string $keyword): string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly?
Automatically enclose keywords with special characters in quotation marks.
If yes, can you please create a small comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -129,10 +129,10 @@ private function createMysqlFullTextQuery(SearchQuery $query, array $matchFields | |||
$againstQuery = ''; | |||
|
|||
foreach ($query->andTerms as $keyword) { | |||
$againstQuery .= '+' . rtrim($keyword, '*') . '* '; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurabakhtin Can you please add this example somewhere in the SearchQuery
PHPDoc?
What kind of change does this PR introduce? (check at least one)
The PR fulfills these requirements:
develop
branch, not themaster
branch if no hotfixOther information:
https://github.com/humhub/humhub-internal/issues/230