-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: SubQuestionQueryEngine generates a python script instead of markdown with questions causing OutputParserException #13498
Comments
I think the prompt is fairly clear? But you can definitely customize the prompt too Open source llms are much less reliable for structured outputs. In this case I might use ollama with json mode turned on |
Not clear enough for open-source models ) Is there a way to override the prompt (question_gen_prompt)? Want to try making it clearer. |
(ignore the |
Yeah, adding this to the end of the prompt resolves the issue: Make sure you generate an actual json as an output, not a script generating the json. |
That's pretty specific though to solve a single edge case 😅 at the top it says json markdown. Maybe a thing where some llms pay attention to the end more than the start |
There are not that many models, you can write a prompt satisfying all of them, especially popular ones like llama3 ) Adding more strict instructions for "silly" models won't hurt. I realize that small models often may generate an invalid json (there are a few bugs reported here) - they are hopeless, there is nothing you can do. But for models like this which misunderstand the prompt - we can add more instructions to the prompt, it will not hurt other models. |
To address the issue with the Here's a simplified template based on your needs: PREFIX = """Given a user question, and a list of tools, output a list of relevant sub-questions in JSON markdown that can help answer the full user question:"""
EXAMPLES = f"""
# Example 1
<Tools>
```json
{{
"uber_10k": "Provides information about Uber financials for year 2021",
"lyft_10k": "Provides information about Lyft financials for year 2021"
}} SUFFIX = """Now, given the following tools and user question, generate the relevant sub-questions in JSON markdown format:""" Combine them to form the complete promptcomplete_prompt = PREFIX + EXAMPLES + SUFFIX
|
Bug Description
I'm using SubQuestionQueryEngine with multiple tools, but it crashes with the following exception:
After debugging, I find out that when the model (llama3:70b-instruct) is asked to generate questions for the metadata, it generates a python script to generate questions instead of questions themselves:
which fails the parser.
I believe you need to make the question_gen_prompt prompt clearer, so that LLM understands we want a result, not a script generating the result.
Version
0.10.37
Steps to Reproduce
It is inconsistent. For some queries/tools/metadata it works, for some it doesn't. Even for the same query sometimes it works, sometimes doesn't.
Relevant Logs/Tracbacks
No response
The text was updated successfully, but these errors were encountered: