Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation data of rule_qa for GPT4, GPT3.5, and Claude #36

Open
alirezamshi opened this issue May 15, 2024 · 5 comments
Open

Evaluation data of rule_qa for GPT4, GPT3.5, and Claude #36

alirezamshi opened this issue May 15, 2024 · 5 comments

Comments

@alirezamshi
Copy link

Hi,

Thanks for the cool resource. According to the publication, " rule_qa was also manually evaluated by a law-trained individual". Do you plan to release the annotations for this evaluation? Thanks

@neelguha
Copy link
Collaborator

neelguha commented May 15, 2024

The answers are available here: https://huggingface.co/datasets/nguha/legalbench/viewer/rule_qa/test.

Is this what you're looking for?

@alirezamshi
Copy link
Author

Thanks for your response. I meant the evaluation of rule-based application: "Rule-application tasks were evaluated manually by a law-trained individual, who analyzed LLM responses for both correctness and analysis"

@neelguha
Copy link
Collaborator

Ah sorry I misunderstood.

  • rule_qa is a "rule-application" task. rule_qa was manually evaluated by a legally trained individual, because it is an open-generation task. That individual examined a model's generation and compared it to the answers in the column of the above-linked dataset
  • The answers for the rule-application tasks that were used for evaluation can be found on this page: https://hazyresearch.stanford.edu/legalbench/getting-started/

@alirezamshi
Copy link
Author

Thanks for the answer. Do you plan to release that human judgement for the evaluation?

@alirezamshi
Copy link
Author

Following up on this...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants