-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: Is there any way to initialize my index data from the Elastic Search database? #13593
Comments
The method To initialize your index with data from an Elastic Search database, you'll need to explicitly load the data into the index. This could involve using a method like If you're trying to work with data already in a vector store and facing issues with an empty docstore, consider directly interacting with the vector store API or using the index's retrieval methods to fetch and add nodes. For example, you can retrieve all nodes stored in the vector store and add them to your index like this: retriever = index.as_retriever(similarity_top_k=10000)
source_nodes = retriever.retrieve("fake")
nodes = [x.node for x in source_nodes] This approach allows you to populate your index with the existing data from your vector store, such as an Elastic Search database.
|
@mglslg with all vector db integrations, the docstore is always empty, because all the nodes are stored in the vector db. This is done to simplify storage. If you need a docstore, you can either manually maintain one outside of the index using the class itself, or set |
@logan-markewich Thank you for your answer! I am a little confused about this docstore object. It seems like the docstore is something like a cache because when I call refresh_ref_docs, I found code like Could you please help me explain its design concept? I did not find any explanation related to it in the official documentation. I initially thought that using the refresh_ref_docs method would automatically read data from Elasticsearch and match the hash value, but later discovered that it only matches the cached hash value in the docstore. In the end, I had to manually check the hash value in Elasticsearch for matching. My code is finally like this: def get_changed_docs(es_index_name: str, doc_list: List[Document]) -> List[Document]:
es_client = get_es_client()
changed_doc_list = []
for doc in doc_list:
query = {
"query": {
"match": {
"metadata.doc_id": f"{doc.get_doc_id()}"
}
}
}
result = es_client.search(index=es_index_name, body=query)
if not result['hits']['hits']:
changed_doc_list.append(doc)
continue
hits = result['hits']['hits']
for hit in hits:
node_content = hit['_source']['metadata']['_node_content']
node_obj = json.loads(node_content)
if node_obj['relationships']['1']['hash'] != doc.hash:
changed_doc_list.append(doc)
return changed_doc_list
need_refresh_docs = get_changed_docs(es_index_name, mongo_documents)
index.refresh_ref_docs(need_refresh_docs) Are there any other better implementations in llamaindex framework? @dosubot |
Question Validation
Question
How can I get an index initialized with a docstore through
VectorStoreIndex.from_vector_store
? When I create an index using theVectorStoreIndex.from_vector_store
method, I find that its docstore is actually empty, which means that my memory is not initialized with vectors and there are no nodes. Is this a bug?Or, is there any other way to initialize my index data from the Elastic Search database?
The text was updated successfully, but these errors were encountered: