Use a different (Lower) value for searching #680

joancf · 2024-04-17T09:33:53Z

joancf
Apr 17, 2024

Hi,
searching performance (precision, recall and time) are highly dependent on the K and L values.
Once indexed, for searching the system needs to compute and search for L words of K bits
Increasing L helps to increase recall but time also increases.
On large datasets it is difficult to know, a priori, which are the effects of using different K and L values as, the theoretical values suppose a random distribution. Also search times depend on the size of indices , memory, other fields...

From the implementation, I understand that internally there are as many Lucene fields as L indicates.

So, in order to speed-up testing, I would like to be able to index with a high value of L (e.g. 300) but then be able to do searches with a lower value of L (that is, using only the first L' indices, where L' <= L ) So, I can compare results and do a table of execution_time/precision by different L values, without having to reindex millions of documents. That would be useful to take a decision about the best L to use (for a given K).
Also it could be useful , on other scenarios: for example in some searches, if we only want very high similarity results we can reduce L, or we we don't care of recall , we could also reduce L .

So, my question/discussion is,
Is there any way (easy way) to use a lower L at query time?

alexklibisz · 2024-04-18T14:12:34Z

alexklibisz
Apr 18, 2024
Maintainer

Hi Joan, thanks for the question. It's an interesting idea. There is no way to do this right now, but I do think it's possible to add. Here's a high-level outline of how:

We would have to add the L`` parameter (maybe call it L_`) from the API layer:

elastiknn/elastiknn-api4s/src/main/scala/com/klibisz/elastiknn/api/NearestNeighborsQuery.scala

Lines 48 to 54 in fab849a

    
           final case class L2Lsh(field: String, candidates: Int, probes: Int = 0, vec: Vec = Vec.Empty()) extends ApproximateQuery { 
        
             override def withVec(v: Vec): NearestNeighborsQuery = copy(vec = v) 
        
             override def withCandidates(candidates: Int): ApproximateQuery = copy(candidates = candidates) 
        
             override def similarity: Similarity = Similarity.L2 
        
           }

And thread it down all the way to this method:

elastiknn/elastiknn-models/src/main/java/com/klibisz/elastiknn/models/L2LshModel.java

Lines 70 to 82 in fab849a

    
           private HashAndFreq[] hashNoProbing(float[] values) { 
        
               HashAndFreq[] hashes = new HashAndFreq[L]; 
        
               for (int ixL = 0; ixL < L; ixL++) { 
        
                   int[] ints = new int[k]; 
        
                   for (int ixk = 0; ixk < k; ixk++) { 
        
                       float[] a = A[ixL * k + ixk]; 
        
                       float b = B[ixL * k + ixk]; 
        
                       ints[ixk] = (int) Math.floor((floatVectorOps.dotProduct(a, values) + b) / w); 
        
                   } 
        
                   hashes[ixL] = HashAndFreq.once(writeIntsWithPrefix(ixL, ints)); 
        
               } 
        
               return hashes; 
        
           }

And use it constrain the for loop. I.e., i < L_ instead of i < L.

A couple considerations:

how do we select the subset of L? Do we just take the first L_ hashes? There's no guarantee that's the optimal subset. But maybe that's good enough.
how does this work with probing? The only model that implements probing is L2Lsh. This technique you describe is sort of the opposite of probing. Instead of artificially generating more hashes, we generate fewer.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a different (Lower) value for searching #680

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Use a different (Lower) value for searching #680

joancf Apr 17, 2024

Replies: 1 comment

alexklibisz Apr 18, 2024 Maintainer

joancf
Apr 17, 2024

alexklibisz
Apr 18, 2024
Maintainer