Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SKU search in lowercase not working #3172

Open
gerrits-ecommerce opened this issue Feb 7, 2024 · 9 comments
Open

SKU search in lowercase not working #3172

gerrits-ecommerce opened this issue Feb 7, 2024 · 9 comments
Assignees

Comments

@gerrits-ecommerce
Copy link

Preconditions

Magento Version : 2.4.6-p3

ElasticSuite Version : 2.11.4.1

Environment : Production and development

Steps to reproduce

  1. Go to https://mijnalius.nl/
  2. Search for "ae02644" (no results)
  3. Search for "AE02644" (correct result)

Information

Products in the shop have SKUs like "AE02644". As you can see, the SKU is provided in uppercase characters.
When we're searching using Elasticsuite, I would expect the result to be the same regardless of using lowercase or uppercase characters.

When I look at the code, I see the SKU field using the "standard" analyzer, which has a "lowercase" filter.
<filter ref="lowercase" />

Could you please take a look and let us know if we are missing something or how we can make it possible, thanks!

Expected result

Get the same search result regardless of whether you search with upper- or lower-case letters.
In the screenshot below, you can see the result when using uppercase letters.

image

Actual result

In the screenshot below you see the result when using lowercase letters.

image

@rbayet
Copy link
Collaborator

rbayet commented Feb 27, 2024

Hello @gerrits-ecommerce,

I can't reproduce your issue, whether the analyzer for the 'SKU' attribute is 'standard' or 'reference' (the default and preferred for SKU-like attributes).
As you point it, the lowercase filter is present in both analyzers so there is no logical reason for that behavior.

Could you provide us the result of analysis both "AE02644" and "ae02644" and analyzers "standard" and "reference" from the Elasticsuite > System > Analysis screen ?

For instance, that's why I have on a default Luma 2.4.6-p2 :

  • AE02644 + standard
    image

  • ae02644 + standard
    image

  • AE02644 + reference
    image

  • ae02644 + reference
    image

Regards,

@gerrits-ecommerce
Copy link
Author

Hello @rbayet

Thank you for the response. I got the following results on the Analysis screen

AE02201 + standard

image

ae02201 + standard

image

AE02201 + reference

image

ae02201 + reference

image

In the fron-end this is the result:

Search with AE02201

image

Search with ae02201

image

What makes it even more strange is that some product do seem to work with the lowercase SKU. For example:

image

@rbayet
Copy link
Collaborator

rbayet commented Mar 6, 2024

Hello @gerrits-ecommerce,

Thanks for the update.

OK, so analyzers wise, there is no reason not to match with a lowercased "AE02201".

So what remains :

  1. possibly a cache issue of some sort
  2. or an issue in the Elasticsuite pre-analysis query (let's call it a "term vectors issue")
  3. or a case where we grab a popular search query from the Magento search terms that leads to 0 results

Concerning 2., does a fulltext search for "ae02201" produce results ?
If not, this could confirm this possibility.

Concerning 3., can you look at your Marketing > Search Terms screen in Magento admin and see if by any chance, you have some popular search queries enabled for suggestion containing "ae02201" ?
image

Here is my reasoning :

  • the catalog product autocomplete relies on Magento DB registered popular search terms to "complete" the user potentially incomplete search:
    • if we find some, we will search for those instead of what the user typed
    • if not, we will search for exactly the user typed
  • so, if by any chance, the only matching popular search terms are of the type "ae02201 [SOMETHING ELSE]" and no product is able to match anymore both "ae02201" and "[SOMETHING ELSE]" (because your indexed data changed and your "minimum should match" configuration is 100%), that could also explain the issue. It's a bit far fetched, but it could be the reason

If this happens to be the source of the issue and you recently moved to Elasticsuite from another search engine, I would encourage you to flag all popular search terms as not suggestable (mass update search_query.display_in_terms to 0).

Regards,

@gerrits-ecommerce
Copy link
Author

Hello @rbayet

Thanks for the response. I dug into your comment and found the following:

  1. It does not seem to be a cache issue; we have turned off the caching, but the problem still exists.
  2. I do not know what exactly you mean by "full text search", but when I search on the lower case variant and hit enter, I get no results.

image

  1. I did some testing related to the search terms and found out that there were search terms present for the uppercase variant. I tried to delete this search term and did a full search for the lowercase variant. The search term then reappeared with 0 results:

image

After that, I searched for the uppercase variant and found a result; the results for the search terms then changed to 1.

image

After this happened, I could find the product with the lowercase variant on the frontend.

image

This explains why some of the products are working, but others are not. This however does not fix the problem that searches with the lowercase variant of the SKU give no search results. And I don't think this is the appropriate behavior.

@rbayet
Copy link
Collaborator

rbayet commented Mar 12, 2024

2. I do not know what exactly you mean by "full text search", but when I search on the lower case variant and hit enter, I get no results.

By "fulltext search", I mean typing your search in the search input bar and hitting enter, by comparison with the "autocomplete search".

Concerning your tests :

  • yes, Magento could "merge" the uses in frontend of a lowercase and uppercase version of the same search term
  • so, the lowercase "ae02201" does not provide any results on its own
    • either in the autocomplete search
    • or the fulltext search
  • but if the uppercase version "AE02201" has been search previously, since it works
    • then searching for "ae02201" will work in autocomplete since we pull the popular search "AE02201" (**)

(**) I don't know if I mentionned that, but the autocomplete product search takes the user search query and looks for matching (suggestable) popular search terms

  • if some are found, we search for those instead of the (possibly incomplete) user search query
  • if none are found, we search for exactly what the user type

Honestly, I'm a bit stumped: from what you showed about the analyzers, there is no reason for "ae02201" being not analyzed in the same exact way as "AE02201" is.
Do you by any chance have a thesaurus (synonym, expansion) configuration on any of those terms ?
Any remnant of a previous module that was handling the search before you installed Elasticsuite ?

I'll tried to reproduce using your exact version, some last questions: what is the flavour of the "Elasticsearch" server (ElasticSearch or OpenSearch) ? Which version is it running ?

Regards,
Richard

@rbayet
Copy link
Collaborator

rbayet commented Mar 13, 2024

Not reproduced so far on a EE 2.4.6-p3 + Elasticsuite 2.11.4.1 with

  • sku using the "reference" analyzer and containing "AE02201"
  • sku using the "standard" analyzer and containing "AE02201"

@gerrits-ecommerce
Copy link
Author

Hello, @rbayet

I took a look at the Thesaurus (screenshot below) but do not believe this to be the cause of the problem.

image

Regarding your other question, we are using ElasticSearch 7.6.0 on our server. And the client always used ElasticSuite so there is not likely to be any thrace of other search engines.

@gerrits-ecommerce
Copy link
Author

@rbayet We dove into it further on our end and found that the issue was the Minimum Should Match setting. This setting was configured to the value 3 for this customer.

The way we understand it is that the reference analyzer splits the SKU field in such a way that the minimum_should_match logic activates. And since our SKU was pretty small it didn't get to the required 3 matches. The capitalized version of the SKU in the meantime does get picked up by one of the other analyzers (SKU itself, it seems like).

Changing the setting back to 1 seems to fix this problem for us.

I don't know if that setting should influence the SKU search results; it was certainly unexpected behavior for us. If this is intentional I'm happy for this issue to be closed. If this behavior is unintentional I guess we could keep this issue open, or create a new one that's a little more to the point.

Either way, thanks for taking the time to help us debug this issue!

@rbayet
Copy link
Collaborator

rbayet commented Mar 21, 2024

Hello @gerrits-ecommerce,

There might be an annoying underlying issue with regards to the way the word_delimiter works in the standard vs reference analyzers, but that's usually covered by the experimental settings.

But setting a fixed, integer, value to the minimum should match is indeed something that we have never look at the implications of.
Obviously, setting it at "1" means at least one term must match which means you have bigger results lists on multi-word queries : searching for "little red riding hood" will products a results list with products containing either only 1, or 2, or 3 or all four of the terms, which can be a bit risky if one of those partial matches get heavily boosted by optimizers.

Basically, the default value is 100%, which means "all [query] elements must be found" (and those elements are determined by the analyzer that is applied to the query terms, which is 99.99% of the time the same analyzer as the targeted field).
Sometimes, people while trying to have some "better SKU matching" (handling typos from users) used to put the msm at 99%, but the SKU matching related experimental settings mentionned in this issue are usually a better approach now.

Maybe the customer made a typo an intended to set the minimum should match at "3<X%" like "3<60%" ?
Which reads like "if the number of elements of the [analyzed] query are up to 3, enforce a 100% minimum should match and only X% above 3 terms".

I'll try locally what's happening with a fixed "3" as minimum should match and shorter queries on our Explain premium module (yes, that's shameless plug.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants