Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content Search: use AND operator by default and don't apply mask for phrases #6979

Open
wants to merge 23 commits into
base: develop
Choose a base branch
from

Conversation

yurabakhtin
Copy link
Contributor

What kind of change does this PR introduce? (check at least one)

  • Bugfix
  • Feature

The PR fulfills these requirements:

  • It's submitted to the develop branch, not the master branch if no hotfix
  • All tests are passing
  • Changelog was modified

Other information:
https://github.com/humhub/humhub-internal/issues/230

@yurabakhtin yurabakhtin requested a review from luke- April 30, 2024 10:39
@yurabakhtin yurabakhtin marked this pull request as draft April 30, 2024 10:46
@yurabakhtin yurabakhtin marked this pull request as ready for review April 30, 2024 11:17
@@ -129,10 +129,10 @@ private function createMysqlFullTextQuery(SearchQuery $query, array $matchFields
$againstQuery = '';

foreach ($query->andTerms as $keyword) {
$againstQuery .= '+' . rtrim($keyword, '*') . '* ';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurabakhtin Is there a reason why you removed the wildcards?

Basically, the search for “Apple” should also find “Applepie”.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luke- I have moved the wildcards from drivers(mysql, zend, solr) to SearchQuery code here, because the chars " are removed on the SearchQuery side, so we cannot know what original term was really when we are on the driver side.

This is how it worked before the change:

  • Apple => Apple* (find Applepie)
  • "Apple" => Apple* (find Applepie)

After the change:

  • Apple => Apple* (find Applepie)
  • "Apple" => Apple (find only Apple, no Applepie)

As I understand it should work as it is described in the first table here.

Tests for quoted cases are here:

I have done the change for the request "Phrases should actually work. Can you check what is causing the problem here and add tests if necessary?", please correct me if I misunderstood it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurabakhtin My intention was for “SearchQuery” to be as neutral a representation of the search query as possible. Without any syntax like wildcards or terms in quotation marks.

e.g. Asterisk or Quotation marks is more of a driver-specific syntax. (Even if the similar for MySQL and Lucene)

But I understand that, for example, with SearchRequest::andTerms we cannot distinguish whether it was originally quoted or not.

Do you have any ideas?

  • We could introduce new attributes such as andParts (with Wildcard) and andTerms (with Quotationmarks).

  • We could define that as soon as a space is contained in e.g. andTerms, it is seen as a term, otherwise it is provided with a wildcard. (But then “Apple” always becomes Apple*.

  • ?

Or did I get something wrong here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luke-

My intention was for “SearchQuery” to be as neutral a representation of the search query as possible. Without any syntax like wildcards or terms in quotation marks.

e.g. Asterisk or Quotation marks is more of a driver-specific syntax. (Even if the similar for MySQL and Lucene)

Ok, I have understood your intention.

For solve it we could implement new array as andParts or extend the existing array with new array item to mark each term like this:

SearchQuery->andTerms = [
    ['Apple', 'mask' => false],
    ['Pine', 'mask' => true],
];

so on driver side we will know what term should be masked with *.

But I am thinking why should we keep an additional array andParts or use the complex structure if we can use a last char of each term as flag. I.e. if the last char is * then we can decide such term should be processed with specific way on a driver side.
For current drivers(mysql, zend, solr) we can keep the code as is, but for future drivers the last char * can be used as flag like this, if the driver doesn't support terms ended with *:

foreach ($request->getSearchQuery()->andTerms as $term) {
    $keywordQuery->addSubquery(str_ends_with($term, '*') ? $this->prepareMaskedTerm($term) : $term);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurabakhtin Okay, I got your point.

Do I understand it correctly that now e.g. in SearchQuery::andTerms, can contain entries that are either quoted and contain an asterisk. Or is only the asterisk included?

Your approach is probably the right one. Since in fact all current drivers support this and it also corresponds to the query language of SearchQuery.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand it correctly that now e.g. in SearchQuery::andTerms, can contain entries that are either quoted and contain an asterisk. Or is only the asterisk included?

@luke- Yes, but a bit correction about "quoted":

  • entries that are either simple word without quotes or contain an asterisk

Examples how a requested keyword is stored in terms:

  • Apple => Apple*
  • "Apple" => Apple
  • Foo bar => Foo bar*
  • "Foo bar" => Foo bar

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurabakhtin Can you please add this example somewhere in the SearchQuery PHPDoc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurabakhtin Can you please add this example somewhere in the SearchQuery PHPDoc?

@luke- 9cec0c6#diff-70f82fb3d09d4959fb9f44a526f8455144c112df88ec7299a2efe28a76171169R28-R64

@marc-farre
Copy link
Collaborator

@yurabakhtin Very sorry, comment #6552 (comment) and the followings are in the wrong issue...

However, thanks for the fix with the @ char in the search field with your commit 308e400

FYI, there is still an error when typing the < and >:

image

@yurabakhtin
Copy link
Contributor Author

However, thanks for the fix with the @ char in the search field with your commit 308e400

FYI, there is still an error when typing the < and >:

@marc-farre Thanks, fixed in the commit 7aed7d6.

I have also tested all special chars from my keyboard and found the same problem with chars ~%()$ and also if char - or + is typed several time at the keyword beginning like ----keyword or +++keyword.
Let me know if you will found others, thanks.

@marc-farre
Copy link
Collaborator

@yurabakhtin Thanks.
@luke- On my side, everything's fine now. The meta-search engine is really powerful now!!

Copy link
Contributor

@luke- luke- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurabakhtin Thanks looks good. Can you please check my two comments?

Are customization tests also necessary for the Solr class?

@@ -145,6 +158,11 @@ private function createMysqlFullTextQuery(SearchQuery $query, array $matchFields
);
}

protected function prepareKeyword(string $keyword): string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly?

Automatically enclose keywords with special characters in quotation marks.

If yes, can you please create a small comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luke- Yes, correct, I have added a comment here.

@@ -129,10 +129,10 @@ private function createMysqlFullTextQuery(SearchQuery $query, array $matchFields
$againstQuery = '';

foreach ($query->andTerms as $keyword) {
$againstQuery .= '+' . rtrim($keyword, '*') . '* ';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurabakhtin Can you please add this example somewhere in the SearchQuery PHPDoc?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants