Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: llama3 + NLSQLTableQueryEngine, markdown symbol on generated SQL statement #13505

Open
1 task done
comcrazy opened this issue May 15, 2024 · 3 comments
Open
1 task done
Labels
question Further information is requested

Comments

@comcrazy
Copy link

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

Hi all,

I am facing an issue on the NLSQLTableQueryEngine, the llm I am using is Llama3.

The generated SQL sometimes contain the markdown symbol as shown below

{'__pydantic_initialised__': True,
 'metadata': {'66660c54-b27c-4a64-90ae-dc1a0c469c62': {},
              'sql_query': '**\n'
                           '```sql\n'
                           'SELECT f.title, f.release_year\n'
                           'FROM film f\n'
                           'ORDER BY f.release_year DESC\n'
                           'LIMIT 10;\n'
                           '```\n'
                           '\n'
                           '**'},

when I try the same question several times then it works as expected and both generated SQL is the same

{'__pydantic_initialised__': True,
 'metadata': {'8c004365-a5cd-4ab1-8205-6368fbda9cf0': {},
              'col_keys': ['title', 'release_year'],
              'result': [('Grosse Wonderful', 2006),
                         ('Airport Pollock', 2006),
                         ('Bright Encounters', 2006),
                         ('Academy Dinosaur', 2006),
                         ('Ace Goldfinger', 2006),
                         ('Adaptation Holes', 2006),
                         ('Affair Prejudice', 2006),
                         ('African Egg', 2006),
                         ('Agent Truman', 2006),
                         ('Chamber Italian', 2006)],
              'sql_query': 'SELECT f.title, f.release_year FROM film f ORDER '
                           'BY f.release_year LIMIT 10;'},

Can anyone give me a hint on what I miss?

Thank you.

@comcrazy comcrazy added the question Further information is requested label May 15, 2024
Copy link

dosubot bot commented May 15, 2024

The issue with markdown symbols appearing in the SQL statements generated by the NLSQLTableQueryEngine when using Llama3 is likely due to the engine not stripping away or properly handling markdown formatting in the language model's response. To resolve this, you can modify the response parsing method in the NLSQLTableQueryEngine to remove or escape markdown symbols before converting the response into an SQL query.

Here's a quick fix you can apply:

def _parse_response_to_sql(self, response: str) -> str:
    # Remove common markdown symbols like *, _, and `
    response = re.sub(r'[\*_`]', '', response)
    
    # Optional: Remove everything after "SQLResult:" marker if present
    sql_result_start = response.find("SQLResult:")
    if sql_result_start != -1:
        response = response[:sql_result_start]
    return response.strip()

This code snippet uses a regular expression to remove markdown symbols from the response. Adjust the regex as needed based on the symbols you encounter. Implement this modification in your local copy of the Llama3 framework, and ensure to test thoroughly to avoid introducing new issues.

For further details, you might want to check the source code of the NLSQLTableQueryEngine or similar components in the framework you're using.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@comcrazy
Copy link
Author

Hi @dosu,

NLSQLTableQueryEngine is using NLSQLRetriever. however, NLSQLRetriever doesn't have any function _parse_response_to_sql.

Can you describe more detail which file or class I will need to update the code you suggested?

Thank you.

Copy link

dosubot bot commented May 15, 2024

To address the issue of markdown symbols appearing in the SQL statements generated by the NLSQLTableQueryEngine, you should modify the method responsible for parsing the SQL query from the response. Since the NLSQLRetriever does not contain a _parse_response_to_sql function, you'll need to implement the changes directly in the NLSQLTableQueryEngine class, specifically within its method for parsing the SQL query from the response.

Here's an updated version of the _parse_response_to_sql method that includes markdown symbol removal:

def _parse_response_to_sql(self, response: str) -> str:
    """Parse response to SQL."""
    # Find and remove SQLResult part
    sql_result_start = response.find("SQLResult:")
    if sql_result_start != -1:
        response = response[:sql_result_start]
    # Remove markdown symbols
    response = response.replace('```', '').replace('**', '')
    return response.strip()

This modification strips the markdown symbols ** and ``` from the response, ensuring that the SQL query is free from these symbols before it's executed or further processed. Adjust this code as necessary to handle any other markdown symbols you might encounter.

Implement this change in the NLSQLTableQueryEngine class within your project. This should resolve the issue with markdown symbols appearing in the generated SQL statements.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant