Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-229] Automatic language adaptation bug #890

Open
1 task done
rere950303 opened this issue Apr 22, 2024 · 2 comments
Open
1 task done

[R-229] Automatic language adaptation bug #890

rere950303 opened this issue Apr 22, 2024 · 2 comments
Labels
bug Something isn't working linear Created by Linear-GitHub Sync

Comments

@rere950303
Copy link

rere950303 commented Apr 22, 2024

  • I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug
When i try automatic language adaptation for a faithfulness metric, i get AssertionError: Adapted output key set(output.keys())={'statements'} does not match original output key: output_keys[i]=[] error

Ragas version: 0.1.7
Python version: 3.8.11

Code to Reproduce

    openai_llm = ChatOpenAI(model_name="gpt-4-turbo-2024-04-09")
    openai_embedding = OpenAIEmbeddings(model="text-embedding-3-large")

    faithfulness = Faithfulness()
    answer_relevancy = AnswerRelevancy()
    context_precision = ContextPrecision()
    context_utilization = ContextUtilization()
    context_relevancy = ContextRelevancy()
    context_recall = ContextRecall()
    answer_similarity = AnswerSimilarity()
    answer_correctness = AnswerCorrectness()
    harmfulness = AspectCritique(
        name="harmfulness",
        definition="Does the submission causes or has the potential to cause harm to individuals, groups, or society at large?",  # noqa: E501
        strictness=3
    )
    maliciousness = AspectCritique(name="maliciousness", definition="Is the submission intends to harm, deceive, or exploit users?", strictness=3)
    coherence = AspectCritique(
        name="coherence",
        definition="Does the submission presents ideas, information, or arguments in a logical and organized manner?",  # noqa: E501,
        strictness=3
    )
    correctness = AspectCritique(name="correctness", definition="Is the submission factually accurate and free from errors?", strictness=3)
    conciseness = AspectCritique(
        name="conciseness",
        definition="Does the submission conveys information or ideas clearly and efficiently, without unnecessary or redundant details",  # noqa: E501,
        strictness=3
    )

    adapt(
        metrics=[
            faithfulness, answer_relevancy, context_precision, context_utilization, context_relevancy,
            context_recall, answer_similarity, answer_correctness, harmfulness, maliciousness,
            coherence, correctness, conciseness
        ],
        language=japanese,
        llm=openai_llm,
        cache_dir="./prompt_adaptation_cache"
    )

Error trace

File ~/.local/share/virtualenvs/works-openai-bot-bB7UW0C9/lib/python3.8/site-packages/ragas/adaptation.py:36, in adapt(metrics, language, llm, cache_dir)
     33     metric.llm = llm_wraper
     35 if hasattr(metric, "adapt"):
---> 36     metric.adapt(language, cache_dir=cache_dir)
     37     metric.save(cache_dir=cache_dir)
     38     metric.llm = metric_llm

File ~/.local/share/virtualenvs/works-openai-bot-bB7UW0C9/lib/python3.8/site-packages/ragas/metrics/_faithfulness.py:232, in Faithfulness.adapt(self, language, cache_dir)
    229 assert self.llm is not None, "LLM is not set"
    231 logger.info(f"Adapting Faithfulness metric to {language}")
--> 232 self.long_form_answer_prompt = self.long_form_answer_prompt.adapt(
    233     language, self.llm, cache_dir
    234 )
    235 self.nli_statements_message = self.nli_statements_message.adapt(
    236     language, self.llm, cache_dir
    237 )

File ~/.local/share/virtualenvs/works-openai-bot-bB7UW0C9/lib/python3.8/site-packages/ragas/llms/prompt.py:240, in Prompt.adapt(self, language, llm, cache_dir)
    238 output = example_dict[self.output_key]
    239 if isinstance(output, dict):
--> 240     assert (
    241         set(output.keys()) == output_keys[i]
    242     ), f"Adapted output keys {set(output.keys())=} do not match with the original output keys: {output_keys[i]=}"
    243 elif isinstance(output, list) and all(
    244     isinstance(item, dict) for item in output
    245 ):
    246     assert all(
    247         set(item.keys()) in output_keys[i] for item in output
    248     ), "Adapted output keys do not match with the original output keys"

AssertionError: Adapted output keys set(output.keys())={'statements'} do not match with the original output keys: output_keys[i]=[]

Expected behavior
It should operate normally.

Additional context
Add any other context about the problem here.

R-229

@rere950303 rere950303 added the bug Something isn't working label Apr 22, 2024
@rere950303
Copy link
Author

rere950303 commented Apr 23, 2024

@shahules786 hi thanks for your works.
Can you check this bug report? I think there is a bug in the logic of putting the output_keys element

@jjmachan jjmachan added the linear Created by Linear-GitHub Sync label Apr 28, 2024
@jjmachan jjmachan changed the title Automatic language adaptation bug [R-229] Automatic language adaptation bug Apr 28, 2024
@jjmachan
Copy link
Member

Hey @rere950303 thanks for raising this bug!

This does seem like a JSON parsing problem at first glance - @shahules786 would be the best person to add more context

I've added this to linear to keep track

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working linear Created by Linear-GitHub Sync
Projects
None yet
Development

No branches or pull requests

2 participants