-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: Storing Duplicates in vector Db #13595
Comments
To resolve the issue of duplicate entries in your Qdrant Db when using Here's a strategy to ensure you're only adding unique nodes:
This approach involves checking each node's unique identifier (assuming each Here is a simplified code snippet to illustrate this approach: def add_unique_nodes(self, nodes: List[BaseNode]) -> None:
node_ids = [node.node_id for node in nodes] # Extract node IDs
existing_nodes = self.vector_store.get_nodes(node_ids=node_ids) # Query existing nodes
existing_node_ids = {node.node_id for node in existing_nodes} # Existing node IDs
new_nodes = [node for node in nodes if node.node_id not in existing_node_ids] # Filter new nodes
if new_nodes:
self.vector_store.add(new_nodes) # Add only new nodes By incorporating this method, you can effectively prevent the addition of duplicate entries in your Qdrant database. This ensures that your
|
You should use an ingestion pipeline with a docstore and vector store attached Assuming each input document has a consistent doc id, it can look up the existing hash of that document (if any) in the docstore, and decide to upsert, skip, or insert Here's one example (ignore the cache though) |
Thanks @logan-markewich would this be your recommended implementation if I wanted to do Chroma vector_store and a locally stored docstore? Embed and store in Chroma vector_store and local docstore:
Load from Chroma vector_store and local docstore:
|
@130jd not quite -- you should pass in the vector store again when loading. Tbh I would load both the vector and docstore outside of the pipeline and just pass it in. But that's just me |
Question Validation
Question
I have created nodes and the nodes are unique,. However when I am trying to create a vector Store index it is creating duplicate
entries in my vector db. I am using Qdrant Db and I have checked in my vector db, There are duplicate entries available.
this is my Qdrant db initialization
and after creating nodes I am using the storagecontext
can anyone please tell me how to resolve this issue
The text was updated successfully, but these errors were encountered: