Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TAG negation pre-filtering in Redis vector search seems not to work correctly #4256

Open
eitanzim opened this issue Dec 24, 2023 · 5 comments
Assignees

Comments

@eitanzim
Copy link

Hey ,
encountering the following problem
We are defining the following hset key:

hset(key,mapping={"embedding": emb, "item_id": str(id), "partition":partition})

Then defining the index schema like this:

def init_hnsw(self, **kwargs):
        self.redis.ft(self.index_name).create_index([
        VectorField("embedding", "HNSW", {"TYPE": "FLOAT32", "DIM": self.dim, "DISTANCE_METRIC": self.metric, "INITIAL_CAP": self.max_elements, "M":       self.M, "EF_CONSTRUCTION": self.ef_construction}),
        TextField("item_id"),
        TagField("partition")
        ])

Then I am trying to perform KNN vector search with the negation on the partition tag. E.g pre-filtering the results to be the most similar to the query out of the items that do not have the provided partition:
My search functions is:

def search(self, data, k=1,partition=None):
        query_vector = np.array(data).astype(np.float32).tobytes()
        p = "(-@partition:{"+partition+"})" if partition is not None else "*"
        q = Query(f'{p}=>[KNN {k} @embedding $vec_param AS vector_score]')\
        .sort_by('vector_score').paging(0,k).return_fields('vector_score','item_id').dialect(2)
        params_dict = {"vec_param": query_vector}
        results = self.redis.ft(self.index_name).search(q, query_params = params_dict)
        scores, ids = [], []
        for item in results.docs:
            scores.append(float(item.vector_score))
            ids.append(item.item_id)
        return scores, ids

An example query string is:
(-@Partition:{shoes})=>[KNN 10 @Embedding $vec_param AS vector_score]

The above query compiles but many of the returned items share the same partition tag.
My questions are:

  1. I couldn't find if tag negation pre-filtering is supported in redis vector search, is it ? (removing the - sign will return results from the same partition so the query structure seems to be correct)
  2. If it is, what is the correct way to do it ? seems like the above should work based on the TAG query filtering documentation.
  3. If it is not, is there an elegant workaround ?

Thanks

@GuyAv46
Copy link
Collaborator

GuyAv46 commented Dec 24, 2023

Thanks for reaching out!
If I understand correctly, you're trying to find the KNN that don't match the partition tag, but the reply contains results that match the tag?
If so, it sounds like a bug. You should be able to filter by any pre-filter and get the KNN that match the filter.
Can you double-check that you pass the correct names and expect case insensitivity? In your example, you shouldn't get back items with the tag Shoes.
I wasn't able to reproduce the issue locally. Can you share what version are you using and if you store multiple tags in the partition filed?

@eitanzim
Copy link
Author

Hey :)
Yup exactly I am trying to get all the results the are not shoes but I still get shoes in the result using the above query.
the partition tag is a single item string (e.g shoes, dress, shirt etc..)
I use the latest redis-server-stack docker image + redis==5.0.1 in python.

I tried to debug the problem on my side and something weird seems to happen:

  1. for small number of K (e.g KNN 10) the returned results contains the negated partition, bug just increasing the K value above some threshold (e.g KNN 30) suddenly the returned results does not contain the partition.
  2. I tried to reproduce the above behavior with indexing dummy vectors and it seems to happen as well.

I think this is might be related to the HNSW algorithm.

I'll try to reproduce the problem and send a code snippet for you to debug.

@GuyAv46
Copy link
Collaborator

GuyAv46 commented Dec 25, 2023

Can you please share on what OS you’re running the server?

@eitanzim
Copy link
Author

macOS

Copy link

This issue is stale because it has been open for 60 days with no activity.

@github-actions github-actions bot added the stale label Feb 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants