Clarification question and explain to support replying in the same language as query. #1146

gDanzel · 2024-05-05T04:12:17Z

🚀 The feature

Clarification question and explain to support replying in the same language as query.

Motivation, pitch

Motivation, pitch:

Now we can query in any language and the PandasAI understands and works (only tried OpenAI, Google Gemini). But the response of clarification and explain only in English.

Would be more 'smart' that PandasAI to reply in the same language as query, at least for clarification questions and explain. This will be very handy for the end-users None-English speaking.

Alternatives

I've tried to just add a simple sentence in file clarificatoin_questions_prompt.tmpl below, and seems it works good:

- Reply in the same language as the query is.

{% for df in context.dfs %}{% set index = loop.index %}{% include 'shared/dataframe.tmpl' with context %}{% endfor %}

<conversation>
{{context.memory.get_conversation()}}
</conversation>

Find the clarification questions that could be asked to a senior data scientist would ask about the query "{{query}}"?
- Only ask for questions related to the query if the query is not clear or ambiguous and that cannot be deduced from the context.
- Return up to 3 questions.
- Reply in the same language as the query is.

Example:
['Question 1', 'Question 2']

Return a JSON list of the clarification questions strings.

Json:

As for explain, I added the same in explain.tmpl and it's not working.

Additional context

Giving the code and output as example:

import pandasai.pandas as pd
from pandasai import Agent
from pandasai.helpers import get_openai_callback
from pandasai.llm import OpenAI, GoogleGemini

from data.sample_dataframe import dataframe

llm = OpenAI()

agent = Agent([pd.DataFrame(dataframe)], config={"llm": llm, "enforce_privacy": True, "verbose": True})
query = 'Trouver les trois premiers pays et valeurs de la DGP'
with get_openai_callback() as cb:
    response = agent.chat(query)
    # print(agent.clarification_questions("Get the top 3 GDP countries."))
    print(agent.clarification_questions(query))
    print(agent.explain())
    print(response)
    print(cb)

Output:

2024-05-05 12:03:31 [INFO] Question: Trouver les trois premiers pays et valeurs de la DGP
2024-05-05 12:03:32 [INFO] Running PandasAI with openai LLM...
2024-05-05 12:03:32 [INFO] Prompt ID: 09c63727-1102-4f0b-bdd2-24f6f35fe67e
2024-05-05 12:03:32 [INFO] Executing Pipeline: GenerateChatPipeline
2024-05-05 12:03:32 [INFO] Executing Step 0: ValidatePipelineInput
2024-05-05 12:03:32 [INFO] Executing Step 1: CacheLookup
2024-05-05 12:03:32 [INFO] Executing Step 2: PromptGeneration
2024-05-05 12:03:36 [INFO] Using prompt:
dfs[0]:10x3
country,gdp,happiness_index
Australia,1243028240,5.12
Canada,2533122854,6.66
Germany,6465372439,7.16

Update this initial code:

# TODO: import the required dependencies
import pandas as pd

# Write code here

# Declare result var: 
type (possible values "string", "number", "dataframe", "plot"). Examples: { "type": "string", "value": f"The highest salary is {highest_salary}." } or { "type": "number", "value": 125 } or { "type": "dataframe", "value": pd.DataFrame({...}) } or { "type": "plot", "value": "temp_chart.png" }

QUERY

Trouver les trois premiers pays et valeurs de la DGP

Variable dfs: list[pd.DataFrame] is already declared.

At the end, declare "result" variable as a dictionary of type and value.

If you are asked to plot a chart, use "matplotlib" for charts, save as png.

Generate python code and return full updated code:
2024-05-05 12:03:36 [INFO] Executing Step 3: CodeGenerator
2024-05-05 12:03:40 [INFO] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-05-05 12:03:40 [INFO] Prompt used:

dfs[0]:10x3
country,gdp,happiness_index
Australia,1243028240,5.12
Canada,2533122854,6.66
Germany,6465372439,7.16

Update this initial code:

# TODO: import the required dependencies
import pandas as pd

# Write code here

# Declare result var: 
type (possible values "string", "number", "dataframe", "plot"). Examples: { "type": "string", "value": f"The highest salary is {highest_salary}." } or { "type": "number", "value": 125 } or { "type": "dataframe", "value": pd.DataFrame({...}) } or { "type": "plot", "value": "temp_chart.png" }

QUERY

Trouver les trois premiers pays et valeurs de la DGP

Variable dfs: list[pd.DataFrame] is already declared.

At the end, declare "result" variable as a dictionary of type and value.

If you are asked to plot a chart, use "matplotlib" for charts, save as png.

Generate python code and return full updated code:

2024-05-05 12:03:40 [INFO] Code generated:
```
# TODO: import the required dependencies
import pandas as pd

Write code here

df = pd.DataFrame({
'country': ['Australia', 'Canada', 'Germany'],
'gdp': [1243028240, 2533122854, 6465372439],
'happiness_index': [5.12, 6.66, 7.16]
})

Find the three countries with their GDP values

top_three_gdp = df[['country', 'gdp']].head(3)

Declare result var

result = {"type": "dataframe", "value": top_three_gdp}
```

2024-05-05 12:03:40 [INFO] Executing Step 4: CachePopulation
2024-05-05 12:03:40 [INFO] Executing Step 5: CodeCleaning
2024-05-05 12:03:40 [INFO]
Code running:

df = dfs[0]
top_three_gdp = df[['country', 'gdp']].head(3)
result = {'type': 'dataframe', 'value': top_three_gdp}
        ```
2024-05-05 12:03:40 [INFO] Executing Step 6: CodeExecution
2024-05-05 12:03:40 [INFO] Executing Step 7: ResultValidation
2024-05-05 12:03:40 [INFO] Answer: {'type': 'dataframe', 'value':           country             gdp
0   United States  19294482071552
1  United Kingdom   2891615567872
2          France   2411255037952}
2024-05-05 12:03:40 [INFO] Executing Step 8: ResultParsing
2024-05-05 12:03:42 [INFO] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-05-05 12:03:42 [INFO] Clarification Questions:  [
    "Est-ce que vous voulez dire les trois premiers pays avec les plus hautes valeurs de PIB?",
    "Voulez-vous les valeurs du PIB en ordre croissant ou décroissant?",
    "Les valeurs du PIB sont-elles en dollars ou une autre devise?"
]
            
['Est-ce que vous voulez dire les trois premiers pays avec les plus hautes valeurs de PIB?', 'Voulez-vous les valeurs du PIB en ordre croissant ou décroissant?', 'Les valeurs du PIB sont-elles en dollars ou une autre devise?']
2024-05-05 12:03:43 [INFO] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-05-05 12:03:43 [INFO] Explanation:  I looked at the data we have and selected the first table. Then, I picked out the countries and their GDP values. Finally, I chose the top three countries with the highest GDP values to show you.
                
I looked at the data we have and selected the first table. Then, I picked out the countries and their GDP values. Finally, I chose the top three countries with the highest GDP values to show you.
          country             gdp
0   United States  19294482071552
1  United Kingdom   2891615567872
2          France   2411255037952
Tokens Used: 868
	Prompt Tokens: 628
	Completion Tokens: 240
Total Cost (USD): $ 0.000674

Process finished with exit code 0

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification question and explain to support replying in the same language as query. #1146

Clarification question and explain to support replying in the same language as query. #1146

gDanzel commented May 5, 2024

Clarification question and explain to support replying in the same language as query. #1146

Clarification question and explain to support replying in the same language as query. #1146

Comments

gDanzel commented May 5, 2024

🚀 The feature

Motivation, pitch

Alternatives

Additional context

QUERY

QUERY

Write code here

Find the three countries with their GDP values

Declare result var