You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Clarification question and explain to support replying in the same language as query.
Motivation, pitch
Motivation, pitch:
Now we can query in any language and the PandasAI understands and works (only tried OpenAI, Google Gemini). But the response of clarification and explain only in English.
Would be more 'smart' that PandasAI to reply in the same language as query, at least for clarification questions and explain. This will be very handy for the end-users None-English speaking.
Alternatives
I've tried to just add a simple sentence in file clarificatoin_questions_prompt.tmpl below, and seems it works good:
- Reply in the same language as the query is.
{% for df in context.dfs %}{% set index = loop.index %}{% include 'shared/dataframe.tmpl' with context %}{% endfor %}
<conversation>
{{context.memory.get_conversation()}}
</conversation>
Find the clarification questions that could be asked to a senior data scientist would ask about the query "{{query}}"?
- Only ask for questions related to the query if the query is not clear or ambiguous and that cannot be deduced from the context.
- Return up to 3 questions.
- Reply in the same language as the query is.
Example:
['Question 1', 'Question 2']
Return a JSON list of the clarification questions strings.
Json:
As for explain, I added the same in explain.tmpl and it's not working.
Additional context
Giving the code and output as example:
import pandasai.pandas as pd
from pandasai import Agent
from pandasai.helpers import get_openai_callback
from pandasai.llm import OpenAI, GoogleGemini
from data.sample_dataframe import dataframe
llm = OpenAI()
agent = Agent([pd.DataFrame(dataframe)], config={"llm": llm, "enforce_privacy": True, "verbose": True})
query = 'Trouver les trois premiers pays et valeurs de la DGP'
with get_openai_callback() as cb:
response = agent.chat(query)
# print(agent.clarification_questions("Get the top 3 GDP countries."))
print(agent.clarification_questions(query))
print(agent.explain())
print(response)
print(cb)
Output:
2024-05-05 12:03:31 [INFO] Question: Trouver les trois premiers pays et valeurs de la DGP
2024-05-05 12:03:32 [INFO] Running PandasAI with openai LLM...
2024-05-05 12:03:32 [INFO] Prompt ID: 09c63727-1102-4f0b-bdd2-24f6f35fe67e
2024-05-05 12:03:32 [INFO] Executing Pipeline: GenerateChatPipeline
2024-05-05 12:03:32 [INFO] Executing Step 0: ValidatePipelineInput
2024-05-05 12:03:32 [INFO] Executing Step 1: CacheLookup
2024-05-05 12:03:32 [INFO] Executing Step 2: PromptGeneration
2024-05-05 12:03:36 [INFO] Using prompt:
dfs[0]:10x3
country,gdp,happiness_index
Australia,1243028240,5.12
Canada,2533122854,6.66
Germany,6465372439,7.16
Update this initial code:
# TODO: import the required dependenciesimportpandasaspd# Write code here# Declare result var: type (possiblevalues"string", "number", "dataframe", "plot"). Examples: { "type": "string", "value": f"The highest salary is {highest_salary}." } or { "type": "number", "value": 125 } or { "type": "dataframe", "value": pd.DataFrame({...}) } or { "type": "plot", "value": "temp_chart.png" }
QUERY
Trouver les trois premiers pays et valeurs de la DGP
Variable dfs: list[pd.DataFrame] is already declared.
At the end, declare "result" variable as a dictionary of type and value.
If you are asked to plot a chart, use "matplotlib" for charts, save as png.
Generate python code and return full updated code:
2024-05-05 12:03:36 [INFO] Executing Step 3: CodeGenerator
2024-05-05 12:03:40 [INFO] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-05-05 12:03:40 [INFO] Prompt used:
df = dfs[0]
top_three_gdp = df[['country', 'gdp']].head(3)
result = {'type': 'dataframe', 'value': top_three_gdp}
```
2024-05-05 12:03:40 [INFO] Executing Step 6: CodeExecution
2024-05-05 12:03:40 [INFO] Executing Step 7: ResultValidation
2024-05-05 12:03:40 [INFO] Answer: {'type': 'dataframe', 'value': country gdp
0 United States 19294482071552
1 United Kingdom 2891615567872
2 France 2411255037952}
2024-05-05 12:03:40 [INFO] Executing Step 8: ResultParsing
2024-05-05 12:03:42 [INFO] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-05-05 12:03:42 [INFO] Clarification Questions: [
"Est-ce que vous voulez dire les trois premiers pays avec les plus hautes valeurs de PIB?",
"Voulez-vous les valeurs du PIB en ordre croissant ou d茅croissant?",
"Les valeurs du PIB sont-elles en dollars ou une autre devise?"
]
['Est-ce que vous voulez dire les trois premiers pays avec les plus hautes valeurs de PIB?', 'Voulez-vous les valeurs du PIB en ordre croissant ou d茅croissant?', 'Les valeurs du PIB sont-elles en dollars ou une autre devise?']
2024-05-05 12:03:43 [INFO] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-05-05 12:03:43 [INFO] Explanation: I looked at the data we have and selected the first table. Then, I picked out the countries and their GDP values. Finally, I chose the top three countries with the highest GDP values to show you.
I looked at the data we have and selected the first table. Then, I picked out the countries and their GDP values. Finally, I chose the top three countries with the highest GDP values to show you.
country gdp
0 United States 19294482071552
1 United Kingdom 2891615567872
2 France 2411255037952
Tokens Used: 868
Prompt Tokens: 628
Completion Tokens: 240
Total Cost (USD): $ 0.000674
Process finished with exit code 0
The text was updated successfully, but these errors were encountered:
馃殌 The feature
Clarification question and explain to support replying in the same language as query.
Motivation, pitch
Motivation, pitch:
Now we can query in any language and the PandasAI understands and works (only tried OpenAI, Google Gemini). But the response of clarification and explain only in English.
Would be more 'smart' that PandasAI to reply in the same language as query, at least for clarification questions and explain. This will be very handy for the end-users None-English speaking.
Alternatives
I've tried to just add a simple sentence in file clarificatoin_questions_prompt.tmpl below, and seems it works good:
- Reply in the same language as the query is.
As for explain, I added the same in explain.tmpl and it's not working.
Additional context
Giving the code and output as example:
Output:
2024-05-05 12:03:31 [INFO] Question: Trouver les trois premiers pays et valeurs de la DGP
2024-05-05 12:03:32 [INFO] Running PandasAI with openai LLM...
2024-05-05 12:03:32 [INFO] Prompt ID: 09c63727-1102-4f0b-bdd2-24f6f35fe67e
2024-05-05 12:03:32 [INFO] Executing Pipeline: GenerateChatPipeline
2024-05-05 12:03:32 [INFO] Executing Step 0: ValidatePipelineInput
2024-05-05 12:03:32 [INFO] Executing Step 1: CacheLookup
2024-05-05 12:03:32 [INFO] Executing Step 2: PromptGeneration
2024-05-05 12:03:36 [INFO] Using prompt:
dfs[0]:10x3
country,gdp,happiness_index
Australia,1243028240,5.12
Canada,2533122854,6.66
Germany,6465372439,7.16
Update this initial code:
QUERY
Trouver les trois premiers pays et valeurs de la DGP
Variable
dfs: list[pd.DataFrame]
is already declared.At the end, declare "result" variable as a dictionary of type and value.
If you are asked to plot a chart, use "matplotlib" for charts, save as png.
Generate python code and return full updated code:
2024-05-05 12:03:36 [INFO] Executing Step 3: CodeGenerator
2024-05-05 12:03:40 [INFO] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-05-05 12:03:40 [INFO] Prompt used:
dfs[0]:10x3
country,gdp,happiness_index
Australia,1243028240,5.12
Canada,2533122854,6.66
Germany,6465372439,7.16
Update this initial code:
QUERY
Trouver les trois premiers pays et valeurs de la DGP
Variable
dfs: list[pd.DataFrame]
is already declared.At the end, declare "result" variable as a dictionary of type and value.
If you are asked to plot a chart, use "matplotlib" for charts, save as png.
Generate python code and return full updated code:
2024-05-05 12:03:40 [INFO] Code generated:
```
# TODO: import the required dependencies
import pandas as pd
Write code here
df = pd.DataFrame({
'country': ['Australia', 'Canada', 'Germany'],
'gdp': [1243028240, 2533122854, 6465372439],
'happiness_index': [5.12, 6.66, 7.16]
})
Find the three countries with their GDP values
top_three_gdp = df[['country', 'gdp']].head(3)
Declare result var
result = {"type": "dataframe", "value": top_three_gdp}
```
2024-05-05 12:03:40 [INFO] Executing Step 4: CachePopulation
2024-05-05 12:03:40 [INFO] Executing Step 5: CodeCleaning
2024-05-05 12:03:40 [INFO]
Code running:
The text was updated successfully, but these errors were encountered: