Multi-collection in a RAG system #32876
-
Hey there!! I'm building a RAG system using a different kind of documents, initially, I created a collection with a knowledge base of JSON files, but then I tried to insert into the same collection individual PDF files, so I got this error trying to store the chunks into the vector db using Error: This error surely is because the collection was created with an initial schema based on the JSON files (?) and when I try to add the PDF chunks it fails. Reading the documentation I found that I can create different collections, but then, how can I search on the main JSON collection and at the same time in the PDF collection to get an accurate response in my RAG system? Is there a way to accomplish that? I appreciate your thoughts on this. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
Looks like you are using langchain.
Each langchain.Document has two members: page_content and metadata. page_content is a text content of your document, metadata is a dict which contains the extra properties of the document.
When you first time call the from_documents(), it initializes a Milvus middleware to create a new collection in Milvus: By default, it deduces a collection schema according to the first metadata you input: If the second metadata is different to the first, you will get the error "=The data don't match with schema fields" But if you specify a parameter "metadata_field", it will define a JSON field to store all the metadata in JSON format: With the JSON metadata field, you can input different metadata when you call the from_documents() |
Beta Was this translation helpful? Give feedback.
-
For a RAG system, no need to create lots of collections. Typically, one collection is ok. |
Beta Was this translation helpful? Give feedback.
Looks like you are using langchain.
The from_documents() is an interface of langchain.VectorStore. It accepts a list of langchian.Document and an embeddings callback function:
Each langchain.Document has two members: page_content and metadata. page_content is a text content of your document, metadata is a dict which contains the extra properties of the document.
When you first time call the from_documents(), it initializes a Milvus middleware to creat…