-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: AzureAISearchVectorStore: more than one metadata filter using FilterCondition.OR fails #13513
Comments
I had hoped that this update has fixed this issue, but it seems it hasn't: #13215 |
The error you're encountering when applying more than one metadata filter using To address this, you'll need to ensure that your filter string correctly implements the def construct_or_filter(field_name, values):
return f"({field_name} eq '{values[0]}' or {field_name} eq '{values[1]}')"
filter_string = construct_or_filter('country', ['United Kingdom', 'Ireland']) This filter string should then be integrated into your query mechanism, ensuring that the fields you're filtering by are marked as filterable in your Azure AI Search index schema. The pull request you mentioned does not seem to address the handling of
|
How can I implement this filter_string into the retrieval? This code fails: def construct_or_filter(field_name, values): Ask this question and filter the UK and Irelandprint("IRELAND AND UK FILTERS") Traceback:
|
Hi @courtneyjean, I think the problem was just caused by that PR. You can try the latest version v0.10.37, which I guess has fixed the problem (by #13435). |
Hi @RussellLuo, I'm using v0.10.37 already :( |
I cannot run your code as I have no available azure credential on hand. To see what happened, maybe you could add some debugging logs using print or set a breakpoint on this line: Line 600 in 4c2a61c
|
@courtneyjean v0.10.37 is the version of the llama-index/llama-index-core package. But you'll want to make sure you have the latest azure search version
|
Hi @logan-markewich : Here is from my pip list. I believe I have the latest version: llama-index 0.10.37 |
llama-index-vector-stores-azureaisearch 0.1.6 is the correct version, but seems like this version has not been published on pypi. |
Good catch @RussellLuo the automatic publishing must have failed. Just manually published. @courtneyjean can you try updating one more time? |
Thanks both. I've tried it, and this is an improvement as it no longer throws an error. But unfortunately I'm still not getting the behaviour I expected. From the code above, here is the new output. When I apply 'NO METADATA FILTERS', the retriever returns two documents. A single filter on Ireland works well, but applying two filters: filters=[MetadataFilter(key='country', value='United Kingdom', operator=<FilterOperator.EQ: '=='>), MetadataFilter(key='country', value='Ireland', operator=<FilterOperator.EQ: '=='>)] condition=<FilterCondition.OR: 'or'> Returns only documents related to the second filter. I also tried similarity_top_k=2 to try to achieve the desired result, but it had no impact. Here is the output I am currently getting: NO METADATA FILTERS RETRIEVE IRELAND DOCs ONLY IRELAND AND UK FILTERS |
Here is some code for the same process applied to a llama_index vector store, and the resulting output. In this code the metadata filter uses FilterCondition.OR and returns two documents (metadata tags country='United Kingdom' OR country='Ireland'. This code demonstrates the anticipated behaviour and output of AzureAISearchVectorStore code above, and demonstrates that there remains an issue with the application of the metadata filter in this code. Set up some example documents with some metadatafrom llama_index.core import Document documents = [ Setp up a normal vector indexfrom llama_index.core import VectorStoreIndex build indexindex = VectorStoreIndex.from_documents(documents) basic_retriver = index.as_retriever() Create some metadata filtersUK_filter = MetadataFilter(key='country', operator=FilterOperator.EQ, value='United Kingdom') Ask this question and filter just for Irelandretriever_ireland = index.as_retriever(filters=MetadataFilters(filters=[Ireland_filter])) Ask this question and filter the UK and Irelandfilter_names = [UK_filter, Ireland_filter] Output is as expected: RETRIEVE IRELAND DOCs ONLY IRELAND AND UK FILTERS |
Bug Description
I've created a set of documents in an AzureAISearchVectorStore, with a 'country' metadata key. I'm trying to create a filter on documents where 'country' equals 'United Kingdom' OR 'Ireland', but it's throwing an error.
Version
0.10.37
Steps to Reproduce
import os
import tiktoken
import llama_index
from llama_index.core import PromptHelper
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
from llama_index.core.node_parser import SimpleNodeParser
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import set_global_service_context
from llama_index.core import StorageContext,load_index_from_storage
from llama_index.core import Settings
from llama_index.vector_stores.azureaisearch import AzureAISearchVectorStore
from llama_index.vector_stores.azureaisearch import (
IndexManagement,
MetadataIndexFieldType,
)
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.retrievers import VectorIndexAutoRetriever
from llama_index.core.vector_stores.types import MetadataInfo, VectorStoreInfo
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core import get_response_synthesizer
#import pandas as pd
from llama_index.core.vector_stores import (
MetadataFilter,
MetadataFilters,
FilterOperator,
FilterCondition
)
azure_endpoint = "xx"
api_version = "2024-02-15-preview"
api_key="xxxx"
search_service_api_key = "xxxx"
search_service_endpoint = "xx"
search_service_api_version = "2023-11-01"
credential = AzureKeyCredential(search_service_api_key)
model = "gpt-4"
deployment_name = "GPT4-Turbo"
embed_model = "text-embedding-ada-002"
embed_deployment_name = "ada002embedding"
temperature = 0
chunk_size = 1024
chunk_overlap = 20
maxWorkers = 5
sleepTimeBeforeRetry = 30
Settings.llm = AzureOpenAI(
model=model,
deployment_name=deployment_name,
api_key=api_key,
azure_endpoint=azure_endpoint,
api_version=api_version,
temperature = temperature
)
Settings.embed_model = AzureOpenAIEmbedding(
model=embed_model,
deployment_name=embed_deployment_name,
api_key=api_key,
azure_endpoint=azure_endpoint,
api_version=api_version)
Set up some example documents with some metadata
from llama_index.core import Document
documents = [
Document(
text="The United Kingdom, made up of England, Scotland, Wales and Northern Ireland, is an island nation in northwestern Europe. England – birthplace of Shakespeare and The Beatles – is home to the capital, London, a globally influential centre of finance and culture.",
metadata={"country" : "United Kingdom"}
),
Document(
text="The Republic of Ireland occupies most of the island of Ireland, off the coast of England and Wales. Its capital, Dublin, is the birthplace of writers like Oscar Wilde, and home of Guinness beer.",
metadata={"country" : "Ireland"}
),
Document(
text="Japan is an island country in East Asia. It is in the northwest Pacific Ocean and is bordered on the west by the Sea of Japan, extending from the Sea of Okhotsk in the north toward the East China Sea, Philippine Sea, and Taiwan in the south.",
metadata={"country" : "Japan"}
)
]
Create an AzureAISearch Vector Index
vector_index_name = 'testci3'
index_client = SearchIndexClient(
endpoint=search_service_endpoint,
index_name=vector_index_name,
credential=credential)
metadata_fields = {'country' : 'country'}
AzureAISearch_vector_store = AzureAISearchVectorStore(
search_or_index_client=index_client,
filterable_metadata_field_keys= metadata_fields,
index_name=vector_index_name,
index_management=IndexManagement.CREATE_IF_NOT_EXISTS,
id_field_key="id",
chunk_field_key="chunk",
embedding_field_key="embedding",
embedding_dimensionality=1536,
metadata_string_field_key="metadata",
doc_id_field_key="doc_id",
language_analyzer="en.lucene",
vector_algorithm_type="exhaustiveKnn"
)
storage_context = StorageContext.from_defaults(vector_store=AzureAISearch_vector_store)
azs_index = VectorStoreIndex.from_documents(documents,
storage_context=storage_context
)
Demonstration of error when you apply more than one filter
azs_retriver = azs_index.as_retriever()
print("--------------------------------------------------------------")
response = basic_retriver.retrieve('What locations are celebrated for being birthplaces of famous writers?')
print("NO METADATA FILTERS")
print(response)
print("--------------------------------------------------------------")
Create some metadata filters
UK_filter = MetadataFilter(key='country', operator=FilterOperator.EQ, value='United Kingdom')
Ireland_filter = MetadataFilter(key='country', operator=FilterOperator.EQ, value='Ireland')
Ask this question and filter just for Ireland
azs_retriever_ireland = azs_index.as_retriever(filters=MetadataFilters(filters=[Ireland_filter]))
print("RETRIEVE IRELAND DOCs ONLY")
print(azs_retriever_ireland.retrieve('What locations are celebrated for being birthplaces of famous writers?'))
print("--------------------------------------------------------------")
Ask this question and filter the UK and Ireland
filter_names = [UK_filter, Ireland_filter]
filters = MetadataFilters(filters=filter_names, condition=FilterCondition.OR)
print("IRELAND AND UK FILTERS")
print(filters)
print("RETRIEVE IRELAND & UK DOCs ONLY")
azs_two_filters_retriever = azs_index.as_retriever(filters=filters)
print(azs_two_filters_retriever.retrieve('What locations are celebrated for being birthplaces of famous writers?'))
print("--------------------------------------------------------------")
Relevant Logs/Tracbacks
The text was updated successfully, but these errors were encountered: