Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Make HNSW algorithm parameters mapping parameters #282

Open
jmazanec15 opened this issue Dec 16, 2020 · 2 comments
Open

Make HNSW algorithm parameters mapping parameters #282

jmazanec15 opened this issue Dec 16, 2020 · 2 comments
Labels
Enhancements Improvement on existing component

Comments

@jmazanec15
Copy link
Member

For index creation, HNSW graphs take a couple different parameters: M, efConstruction, dimension, spaceType, etc. Currently, M, efConstruction, and spaceType are only configurable via index settings. Dimension is configurable via mapping parameters.

There are a few limitations/issues with having M, efConstruction, and spaceType only be configurable via index settings:

  1. We have seen some race conditions where, during mapper build, the index settings have not yet been set for the index, causing M, efConstruction and spaceType to fall back to defaults for that field. For spaceType, this can be a serious bug because it will default to l2 space as opposed to cosinesimil space in some cases.
  2. It prevents users from having multiple fields in the same index with different values for M, efConstruction, and spaceType. For instance, it is not possible to have a cosinesimil field and l2 field in the same index.

This is a feature request to make M, efConstruction and spaceType mapping parameters and to start to deprecate M, efConstruction and spaceType as index settings. In order to maintain backwards compatibility, it makes sense in KNNVectorFieldMapper to first check if the mapping parameters have been set, and, if not, fall back to reading index settings (as is currently done).

Please leave comments on this issue with thoughts/questions/concerns.

@jmazanec15 jmazanec15 added the Enhancements Improvement on existing component label Dec 16, 2020
@ofirnk
Copy link

ofirnk commented Dec 17, 2020

Thanks for looking into it!
(btw, I'd label it as a bug, not enhancement)

Is there some workaround until this is fixed?
Some memory gate? Splitting the index creation to couple of steps perhaps?

@jmazanec15
Copy link
Member Author

Hi @ofirnk , we are still looking into root cause of https://discuss.opendistrocommunity.dev/t/cosine-similarity-formula/4390/12. We haven't confirmed that this is causing that issue, yet. We are currently working to reproduce the issue. Once we are able to, we will create a separate issue that contains bug and workaround.

On "We have seen some race conditions where, during mapper build, the index settings have not yet been set for the index, causing M, efConstruction and spaceType to fall back to defaults for that field. For spaceType, this can be a serious bug because it will default to l2 space as opposed to cosinesimil space in some cases." Sorry, my wording here was not clear. We saw this happen in #239 and it may be the case for https://discuss.opendistrocommunity.dev/t/cosine-similarity-formula/4390/12, but have not yet confirmed.

Regardless, it would be an improvement to set M, efConstruction, spaceType as mapping parameters, as opposed to index settings. That is why I labeled as enhancement instead of bug.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Enhancements Improvement on existing component
Projects
None yet
Development

No branches or pull requests

2 participants