Add MATH evaluation #135

danieljkim0118 · 2024-03-25T20:22:45Z

Added evaluation scripts for the MATH (Hendrycks et al., 2021) dataset.

yizhongw · 2024-03-25T22:23:25Z

scripts/prepare_eval_data.sh

@@ -24,6 +24,9 @@ wget -P data/eval/tydiqa/ https://storage.googleapis.com/tydiqa/v1.1/tydiqa-gold
 # GSM dataset
 wget -P data/eval/gsm/ https://github.com/openai/grade-school-math/raw/master/grade_school_math/data/test.jsonl

+# MATH dataset
+mkdir -p data/eval/MATH


This mkdir might not be necessary.

yizhongw · 2024-03-25T22:29:40Z

eval/MATH/run_eval.py

+                if args.no_cot:
+                    prompts = [prompt_prefix + "Question: " + example["question"].strip() + "\nAnswer:" for example in test_data]
+                else:
+                    prompts = [prompt_prefix + "Question: " + "\n" + example["question"].strip() + "\nSolution: " + "\n" for example in test_data]


Interesting. So, for CoT, they use "Solution" in the prefix not "Answer"?

Oh I saw it above. Nvm.

yizhongw · 2024-03-25T22:33:13Z

Looks good. Thanks @danieljkim0118! Have you tested the performance of some vanilla pretrained models and tulu models? I am planning to run some tests. It would be great if you have some numbers that I can compare to.

hamishivi · 2024-05-25T18:02:32Z

It would be good to merge this soon!

Add MATH evaluation

4c46f3e

yizhongw approved these changes Mar 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MATH evaluation #135

Add MATH evaluation #135

danieljkim0118 commented Mar 25, 2024

yizhongw Mar 25, 2024

yizhongw Mar 25, 2024

yizhongw Mar 25, 2024

yizhongw commented Mar 25, 2024

hamishivi commented May 25, 2024

Add MATH evaluation #135

Are you sure you want to change the base?

Add MATH evaluation #135

Conversation

danieljkim0118 commented Mar 25, 2024

yizhongw Mar 25, 2024

Choose a reason for hiding this comment

yizhongw Mar 25, 2024

Choose a reason for hiding this comment

yizhongw Mar 25, 2024

Choose a reason for hiding this comment

yizhongw commented Mar 25, 2024

hamishivi commented May 25, 2024