Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: How are citations generated in the response of CitationQueryEngine? #13536

Open
1 task done
mw19930312 opened this issue May 16, 2024 · 6 comments
Open
1 task done
Labels
question Further information is requested

Comments

@mw19930312
Copy link

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

I understand that in a CitationQueryEngine, we retrieve a couple of reference nodes from a vector index. Then, these nodes are passed into a synthesizer to generate a final response.

@mw19930312 mw19930312 added the question Further information is requested label May 16, 2024
Copy link

dosubot bot commented May 16, 2024

Hey there @mw19930312! Great to see you diving into the intricacies of the CitationQueryEngine again. 😄 I remember our last brainstorming session here; always a pleasure to assist a fellow knowledge seeker. I'm on it - gonna unravel the citation generation mystery for you in just a jiffy!

@logan-markewich
Copy link
Collaborator

The nodes are labeled as source 1, source 2, etc. and then the LLM is prompted to cite which sources it used

@mw19930312
Copy link
Author

Would you mind referring me where the LLM is prompted? I just wonder how to pass the citation URLs if there are any. Also, if I understand it correctly, the nodes are formatted as the following through a metadata conversion template, right?

[source 1]
context:
metadata:
xxx: xxx
url: url

@mw19930312
Copy link
Author

@logan-markewich ^^

Copy link

dosubot bot commented May 16, 2024

In the CitationQueryEngine, the process of prompting the LLM for citations involves a few key steps:

  1. Retrieving and Formatting Nodes: Initially, nodes are retrieved from a vector index and are then formatted as citation sources. Each node is assigned a label (e.g., source 1, source 2), which includes the context and any relevant metadata, such as URLs. This is achieved through the retrieval and formatting mechanisms within the engine, where metadata from the nodes is used to create a detailed context for the LLM.

  2. Synthesizing the Response: The formatted nodes are passed to the LLM along with a citation prompt template. This template instructs the LLM on how to incorporate the citations into its response, guiding it to reference the sources by their assigned labels. The process ensures that the final response generated by the LLM includes citations to the provided sources, making it clear which parts of the response are supported by which sources.

  3. Metadata and URLs: The inclusion of URLs and other metadata in the citation nodes is controlled by the metadata_mode parameter. This means that when nodes are converted into citation sources, their metadata, including URLs, can be formatted and included based on the settings specified in the metadata_mode. This allows for a flexible approach to how detailed the citations are, including the possibility of directly citing URLs within the response.

This approach ensures that the final response from the LLM is not only informative but also accurately cites the sources it used, providing a clear and traceable path back to the original information.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@logan-markewich
Copy link
Collaborator

@mw19930312

prompt:

code entry point:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants