Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark the synonym token filter as updateable and provide a better example #25

Open
damienalexandre opened this issue Feb 28, 2020 · 1 comment

Comments

@damienalexandre
Copy link
Member

Reading https://www.elastic.co/blog/boosting-the-power-of-elasticsearch-with-synonyms - we quickly see Emoji Search can benefit from the new POST /synonym_test/_reload_search_analyzers API.

Index-time synonyms have several disadvantages:

  • The index might get bigger, because all synonyms must be indexed.
  • Search scoring, which relies on term statistics, might suffer because synonyms are also counted, and the statistics for less common words become skewed.
  • Synonym rules can’t be changed for existing documents without reindexing.

...

Using synonyms in search-time analyzers on the other hand doesn’t have many of the above mentioned problems:

  • The index size is unaffected.
  • The term statistics in the corpus stay the same.
  • Changes in the synonym rules don’t require reindexing of documents.

And:

Starting with Elasticsearch 7.3, this reopening of indices in order to see changes in synonym files is no longer needed.

We must:

  • provide a new "search time" config
  • write a better documentation for updating synonyms in production
@damienalexandre
Copy link
Member Author

Using the synonym as a graph filter and at search time could also be better.

https://www.adelean.com/blog/20210421_synonym_graph_in_elasticsearch/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant