Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification question and explain to support replying in the same language as query. #1146

Open
gDanzel opened this issue May 5, 2024 · 0 comments

Comments

@gDanzel
Copy link

gDanzel commented May 5, 2024

馃殌 The feature

Clarification question and explain to support replying in the same language as query.

Motivation, pitch

Motivation, pitch:

Now we can query in any language and the PandasAI understands and works (only tried OpenAI, Google Gemini). But the response of clarification and explain only in English.

Would be more 'smart' that PandasAI to reply in the same language as query, at least for clarification questions and explain. This will be very handy for the end-users None-English speaking.

Alternatives

I've tried to just add a simple sentence in file clarificatoin_questions_prompt.tmpl below, and seems it works good:

- Reply in the same language as the query is.

{% for df in context.dfs %}{% set index = loop.index %}{% include 'shared/dataframe.tmpl' with context %}{% endfor %}

<conversation>
{{context.memory.get_conversation()}}
</conversation>

Find the clarification questions that could be asked to a senior data scientist would ask about the query "{{query}}"?
- Only ask for questions related to the query if the query is not clear or ambiguous and that cannot be deduced from the context.
- Return up to 3 questions.
- Reply in the same language as the query is.

Example:
['Question 1', 'Question 2']

Return a JSON list of the clarification questions strings.

Json:

As for explain, I added the same in explain.tmpl and it's not working.

Additional context

Giving the code and output as example:

import pandasai.pandas as pd
from pandasai import Agent
from pandasai.helpers import get_openai_callback
from pandasai.llm import OpenAI, GoogleGemini

from data.sample_dataframe import dataframe

llm = OpenAI()

agent = Agent([pd.DataFrame(dataframe)], config={"llm": llm, "enforce_privacy": True, "verbose": True})
query = 'Trouver les trois premiers pays et valeurs de la DGP'
with get_openai_callback() as cb:
    response = agent.chat(query)
    # print(agent.clarification_questions("Get the top 3 GDP countries."))
    print(agent.clarification_questions(query))
    print(agent.explain())
    print(response)
    print(cb)

Output:

2024-05-05 12:03:31 [INFO] Question: Trouver les trois premiers pays et valeurs de la DGP
2024-05-05 12:03:32 [INFO] Running PandasAI with openai LLM...
2024-05-05 12:03:32 [INFO] Prompt ID: 09c63727-1102-4f0b-bdd2-24f6f35fe67e
2024-05-05 12:03:32 [INFO] Executing Pipeline: GenerateChatPipeline
2024-05-05 12:03:32 [INFO] Executing Step 0: ValidatePipelineInput
2024-05-05 12:03:32 [INFO] Executing Step 1: CacheLookup
2024-05-05 12:03:32 [INFO] Executing Step 2: PromptGeneration
2024-05-05 12:03:36 [INFO] Using prompt:
dfs[0]:10x3
country,gdp,happiness_index
Australia,1243028240,5.12
Canada,2533122854,6.66
Germany,6465372439,7.16

Update this initial code:

# TODO: import the required dependencies
import pandas as pd

# Write code here

# Declare result var: 
type (possible values "string", "number", "dataframe", "plot"). Examples: { "type": "string", "value": f"The highest salary is {highest_salary}." } or { "type": "number", "value": 125 } or { "type": "dataframe", "value": pd.DataFrame({...}) } or { "type": "plot", "value": "temp_chart.png" }

QUERY

Trouver les trois premiers pays et valeurs de la DGP

Variable dfs: list[pd.DataFrame] is already declared.

At the end, declare "result" variable as a dictionary of type and value.

If you are asked to plot a chart, use "matplotlib" for charts, save as png.

Generate python code and return full updated code:
2024-05-05 12:03:36 [INFO] Executing Step 3: CodeGenerator
2024-05-05 12:03:40 [INFO] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-05-05 12:03:40 [INFO] Prompt used:

dfs[0]:10x3
country,gdp,happiness_index
Australia,1243028240,5.12
Canada,2533122854,6.66
Germany,6465372439,7.16

Update this initial code:

# TODO: import the required dependencies
import pandas as pd

# Write code here

# Declare result var: 
type (possible values "string", "number", "dataframe", "plot"). Examples: { "type": "string", "value": f"The highest salary is {highest_salary}." } or { "type": "number", "value": 125 } or { "type": "dataframe", "value": pd.DataFrame({...}) } or { "type": "plot", "value": "temp_chart.png" }

QUERY

Trouver les trois premiers pays et valeurs de la DGP

Variable dfs: list[pd.DataFrame] is already declared.

At the end, declare "result" variable as a dictionary of type and value.

If you are asked to plot a chart, use "matplotlib" for charts, save as png.

Generate python code and return full updated code:

2024-05-05 12:03:40 [INFO] Code generated:
```
# TODO: import the required dependencies
import pandas as pd

Write code here

df = pd.DataFrame({
'country': ['Australia', 'Canada', 'Germany'],
'gdp': [1243028240, 2533122854, 6465372439],
'happiness_index': [5.12, 6.66, 7.16]
})

Find the three countries with their GDP values

top_three_gdp = df[['country', 'gdp']].head(3)

Declare result var

result = {"type": "dataframe", "value": top_three_gdp}
```

2024-05-05 12:03:40 [INFO] Executing Step 4: CachePopulation
2024-05-05 12:03:40 [INFO] Executing Step 5: CodeCleaning
2024-05-05 12:03:40 [INFO]
Code running:

df = dfs[0]
top_three_gdp = df[['country', 'gdp']].head(3)
result = {'type': 'dataframe', 'value': top_three_gdp}
        ```
2024-05-05 12:03:40 [INFO] Executing Step 6: CodeExecution
2024-05-05 12:03:40 [INFO] Executing Step 7: ResultValidation
2024-05-05 12:03:40 [INFO] Answer: {'type': 'dataframe', 'value':           country             gdp
0   United States  19294482071552
1  United Kingdom   2891615567872
2          France   2411255037952}
2024-05-05 12:03:40 [INFO] Executing Step 8: ResultParsing
2024-05-05 12:03:42 [INFO] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-05-05 12:03:42 [INFO] Clarification Questions:  [
    "Est-ce que vous voulez dire les trois premiers pays avec les plus hautes valeurs de PIB?",
    "Voulez-vous les valeurs du PIB en ordre croissant ou d茅croissant?",
    "Les valeurs du PIB sont-elles en dollars ou une autre devise?"
]
            
['Est-ce que vous voulez dire les trois premiers pays avec les plus hautes valeurs de PIB?', 'Voulez-vous les valeurs du PIB en ordre croissant ou d茅croissant?', 'Les valeurs du PIB sont-elles en dollars ou une autre devise?']
2024-05-05 12:03:43 [INFO] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-05-05 12:03:43 [INFO] Explanation:  I looked at the data we have and selected the first table. Then, I picked out the countries and their GDP values. Finally, I chose the top three countries with the highest GDP values to show you.
                
I looked at the data we have and selected the first table. Then, I picked out the countries and their GDP values. Finally, I chose the top three countries with the highest GDP values to show you.
          country             gdp
0   United States  19294482071552
1  United Kingdom   2891615567872
2          France   2411255037952
Tokens Used: 868
	Prompt Tokens: 628
	Completion Tokens: 240
Total Cost (USD): $ 0.000674

Process finished with exit code 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant