Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

[RFC] K-NN #1

Closed
vamshin opened this issue Aug 13, 2019 · 8 comments
Closed

[RFC] K-NN #1

vamshin opened this issue Aug 13, 2019 · 8 comments

Comments

@vamshin
Copy link
Member

vamshin commented Aug 13, 2019

The purpose of this issue is to capture feedback and comments regarding the project's request for comments.

@vamshin vamshin changed the title 1 [RFC] Index Management Aug 13, 2019
@vamshin vamshin changed the title [RFC] Index Management [RFC] K-NN Aug 13, 2019
@aalbahem
Copy link

aalbahem commented Aug 14, 2019

Can be this refactored so that the Lucene part and the Elasticsearch become two different separate projects. I was thinking of porting this to Solr as a plugin.

@vamshin
Copy link
Member Author

vamshin commented Sep 17, 2019

Hi @aalbahem ,

Thanks for the suggestion. As of now we are just focusing on Elasticsearch plugin development.

@JackRyanson
Copy link

Hi @vamshin its great you guys are working on this. My hopes were high for this feature to be in Elasticsearch but since when they decided to make it X pack i really understood the value of what Open Distro is doing here.

I have some general question - sorry if this should be obvious - how is this different from the feature in Elastic from your point of view?

Could you simply not reuse what was Apache before this commit https://github.com/elastic/elasticsearch/pull/43280/files and then extend from there? (there is value in remaining somehow aligned)

Will you provide the extension to painless so that one can do custom scoring also considering vectors?

@vamshin
Copy link
Member Author

vamshin commented Sep 26, 2019

Hi @JackRyanson,

Opendistro knn works seamlessly and does not require any additional scripting for scoring. Opendistro knn currently uses Euclidean distance to measure similarity. We do plan to support other similarity measures . Yes, we do plan to provide extension to painless for custom scoring. Let us know if you have any other questions.

@antonisloizou
Copy link

Hello :) To echo the above, its great you guys are working on this! I've been trying to use the plugin but I keep running into this issue when loading the libKNNIndexV1_7_3_6.so library.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGILL (0x4) at pc=0x00007f75dad5546a, pid=13530, tid=13660
#
# JRE version: OpenJDK Runtime Environment (12.0.2+10) (build 12.0.2+10)
# Java VM: OpenJDK 64-Bit Server VM (12.0.2+10, mixed mode, sharing, tiered, compressed oops, concurrent mark sweep gc, linux-amd64)
# Problematic frame:
# C  [libKNNIndexV1_7_3_6.so+0xcb46a]  _GLOBAL__sub_I_distcomp_sparse_scalar_fast.cc+0x2a
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P" (or dumping to /[...]/opendistroforelasticsearch-1.2.0/core.13530)
#
# An error report file with more information is saved as:
# logs/hs_err_pid13530.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

@vamshin
Copy link
Member Author

vamshin commented Oct 2, 2019

Hi @antonisloizou ,

From initial observation, looks like compatibility issue of .so lib. Can you help us understand, the environment where you are installing the plugin like os, cpu family etc. Creating an issue to follow up more on this on a different thread
#4

@vamshin vamshin closed this as completed Jan 23, 2020
@sichenzhao
Copy link

Following the RFC.md, I got "{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"unknown value for [index.codec] must be one of [default, best_compression] but was: KNNCodec"}],"type":"illegal_argument_exception","reason":"unknown value for [index.codec] must be one of [default, best_compression] but was: KNNCodec"},"status":400}". May I ask what did I do wrong?

@vamshin
Copy link
Member Author

vamshin commented Feb 9, 2020

@sichenzhao,

We have updated knn index creation procedure recently. We are yet to update RFC.md.
Sorry for the inconvenience.

Instead of index.codec, you need to use "index.knn" : true . For knn_vector data type you will have to mention dimensions as well. Please refer below example.

Example :-

PUT /myindex
{
  "settings" : {
  "index": {
      "knn": true
    }
  },
  "mappings": {
      "properties": {
        "my_vector": { 
          "type": "knn_vector",
          "dimension": 2
        }       
      }
  }
}

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants