Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to retrieve specific files? #13550

Open
1 task done
susht3 opened this issue May 17, 2024 · 3 comments
Open
1 task done

How to retrieve specific files? #13550

susht3 opened this issue May 17, 2024 · 3 comments
Labels
question Further information is requested

Comments

@susht3
Copy link

susht3 commented May 17, 2024

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

I have multiple files,can i retrieve specify document nodes or all nodes? like this:
nodes = recursive_retriever.retrieve(query, "xx_name")
which xx_name is my file_name

@susht3 susht3 added the question Further information is requested label May 17, 2024
Copy link

dosubot bot commented May 17, 2024

To retrieve specific document nodes or all nodes from multiple files by specifying a file name, you can use a custom retriever that extends the BaseRetriever class from the LlamaIndex library. This approach allows for both synchronous and asynchronous retrieval, enabling efficient data processing and flexibility in handling large datasets or real-time requirements.

Here's a basic example to guide you:

from llama_index.core.base.base_retriever import BaseRetriever
from llama_index.core.schema import QueryBundle, NodeWithScore

class MyRetriever(BaseRetriever):
    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        # Your synchronous retrieval logic here
        pass

    async def _aretrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        # Your asynchronous retrieval logic here
        pass

# Initialize your retriever
my_retriever = MyRetriever()

# For synchronous retrieval
nodes = my_retriever.retrieve("Your query here")

# For asynchronous retrieval
nodes = await my_retriever.aretrieve("Your query here")

This setup allows you to customize the retrieval process, enabling you to specify file names or other parameters to fetch the desired nodes. Whether you choose synchronous or asynchronous methods depends on your application's specific needs, such as performance considerations or the requirement for real-time data processing.

For more detailed information and examples, you might want to explore the LlamaIndex repository and its documentation.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@logan-markewich
Copy link
Collaborator

Probably use metadata filtering, assuming the file name/path is in the metadata of nodes you inserted
Here's an example with pinecone
https://docs.llamaindex.ai/en/stable/examples/vector_stores/pinecone_metadata_filter/?h=metadata+filter

@susht3
Copy link
Author

susht3 commented May 22, 2024

Probably use metadata filtering, assuming the file name/path is in the metadata of nodes you inserted Here's an example with pinecone https://docs.llamaindex.ai/en/stable/examples/vector_stores/pinecone_metadata_filter/?h=metadata+filter

thank you,it looks simple and efficient,but it uses PINECONE_API_KEY,how to get this key? is it free?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants