Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding the source of math_10k.json #43

Open
HuangOwen opened this issue Oct 12, 2023 · 10 comments
Open

Question regarding the source of math_10k.json #43

HuangOwen opened this issue Oct 12, 2023 · 10 comments

Comments

@HuangOwen
Copy link

Hi, thanks for the good work!

I have a question regarding the math_10k.json, which is used for finetuning. You mentioned in the paper that ''To enhance the diversity of our data, we incorporate the training sets from GSM8K, MAWPS, MAWPS-single'', but there is no training set for MAWPS to the best of my knowledge. When I checked the samples from math_10k.json, I found that there are some question-answer that are exactly the same as the test set of AddSub/MultiArith/SingleEq. Could you please further elaborate on this?

@LYH-YF
Copy link
Collaborator

LYH-YF commented Oct 24, 2023

@HuangOwen we use the MAWPS dataset preprocessed by MWPToolkit(https://github.com/LYH-YF/MWPToolkit), which splits trainset/validset/testset on MAWPS. Here you can find the trainset https://github.com/LYH-YF/MWPToolkit/tree/master/dataset/mawps. Also you can find mawps-single here.

@HuangOwen
Copy link
Author

@HuangOwen we use the MAWPS dataset preprocessed by MWPToolkit(https://github.com/LYH-YF/MWPToolkit), which splits trainset/validset/testset on MAWPS. Here you can find the trainset https://github.com/LYH-YF/MWPToolkit/tree/master/dataset/mawps. Also you can find mawps-single here.

I don't think you follow the split in your evaluation and math10k.json. For example, the first example in /dataset/MultiArith/test.json (which you used for testing)

{
"instruction": " At the schools book fair Sam bought 13 adventure books and 17 mystery books. If 15 of the books were used, how many new books did he buy? ",
"input": "",
"output": "\nA: Sam bought 13 adventure books and 17 mystery books. That means he bought 13 + 17 = 30 books in total. 15 of them were used, so he has 30 - 15 = 15 new books. The answer is 15.",
"answer": "15.0"
}

could be located in the math_10k.json. Could you please further elaborate on this?

@HZQ950419
Copy link
Collaborator

@HuangOwen we use the MAWPS dataset preprocessed by MWPToolkit(https://github.com/LYH-YF/MWPToolkit), which splits trainset/validset/testset on MAWPS. Here you can find the trainset https://github.com/LYH-YF/MWPToolkit/tree/master/dataset/mawps. Also you can find mawps-single here.

I don't think you follow the split in your evaluation and math10k.json. For example, the first example in /dataset/MultiArith/test.json (which you used for testing)

{
"instruction": " At the schools book fair Sam bought 13 adventure books and 17 mystery books. If 15 of the books were used, how many new books did he buy? ",
"input": "",
"output": "\nA: Sam bought 13 adventure books and 17 mystery books. That means he bought 13 + 17 = 30 books in total. 15 of them were used, so he has 30 - 15 = 15 new books. The answer is 15.",
"answer": "15.0"
}

could be located in the math_10k.json. Could you please further elaborate on this?

Hi, we exactly follow the dataset split in MWPToolkit(https://github.com/LYH-YF/MWPToolkit). The example you provide can be found in the MAWPS and MAWPS-Single training set which is used to collect the fine-tuning dataset. The reason you can find the example in /dataset/MultiArith/test.json, I think it is the way the authors create the MultiArith dataset.

Please let us know if you have further questions!

@HuangOwen
Copy link
Author

HuangOwen commented Dec 4, 2023

Thanks for the reply, I have went through all subset of MAWPS (AddSub/MultiArith/SingleEq) and I found that all the test samples in these subset can be found in math_10k.json, while you use math_10k for the instruction fine-tuning. I think this is not reasonable. If you use the dataset split in MWPToolkit, you should not test on these specific subsets (AddSub/MultiArith/SingleEq).

@HuangOwen
Copy link
Author

@HuangOwen we use the MAWPS dataset preprocessed by MWPToolkit(https://github.com/LYH-YF/MWPToolkit), which splits trainset/validset/testset on MAWPS. Here you can find the trainset https://github.com/LYH-YF/MWPToolkit/tree/master/dataset/mawps. Also you can find mawps-single here.

I don't think you follow the split in your evaluation and math10k.json. For example, the first example in /dataset/MultiArith/test.json (which you used for testing)

{
"instruction": " At the schools book fair Sam bought 13 adventure books and 17 mystery books. If 15 of the books were used, how many new books did he buy? ",
"input": "",
"output": "\nA: Sam bought 13 adventure books and 17 mystery books. That means he bought 13 + 17 = 30 books in total. 15 of them were used, so he has 30 - 15 = 15 new books. The answer is 15.",
"answer": "15.0"
}

could be located in the math_10k.json. Could you please further elaborate on this?

Hi, we exactly follow the dataset split in MWPToolkit(https://github.com/LYH-YF/MWPToolkit). The example you provide can be found in the MAWPS and MAWPS-Single training set which is used to collect the fine-tuning dataset. The reason you can find the example in /dataset/MultiArith/test.json, I think it is the way the authors create the MultiArith dataset.

Please let us know if you have further questions!

I think this data leak issue has nothing to do with the way authors create MultiArith dataset as MultiArith is proposed in 2015 and included in MAWPS in 2016, which are before the MWPToolkit is proposed.

@callanwu
Copy link

callanwu commented Dec 4, 2023

mark

@HZQ950419
Copy link
Collaborator

Thanks for the reply, I have went through all subset of MAWPS (AddSub/MultiArith/SingleEq) and I found that all the test samples in these subset can be found in math_10k.json, while you use math_10k for the instruction fine-tuning. I think this is not reasonable. If you use the dataset split in MWPToolkit, you should not test on these specific subsets (AddSub/MultiArith/SingleEq).

Hi,

Many thanks for your questions!

After careful double-checking, there is a data leak issue with the math reasoning experiments. We tried our best to salvage the impact of this data leak. We use the MAWPS test set to evaluate the performance of PEFT methods and the result table has been updated. The findings in the paper are still consistent. And we made a special announcement for researchers who are using our repository for their experiments. Furthermore, we also upload two variations of math_10k.json where the MAWPS samples are deleted.

Sincerely apologize for any inconvenience caused by our mistake!
If you have any questions, please let us know! Many thanks!

@HuangOwen
Copy link
Author

Hi Zhiqiang,

Thanks for your reply and your effort in fixing the problem! Glad that the dataset has been updated.

@Yuan0320
Copy link

Yuan0320 commented Dec 10, 2023

Hi @HZQ950419, thanks for your announcement! Were the MAWPS test results shown in the table tested at https://github.com/LYH-YF/MWPToolkit/blob/master/dataset/mawps/testset.json (238 samples)?

@HZQ950419
Copy link
Collaborator

Hi @HZQ950419, thanks for your announcement! Were the MAWPS test results shown in the table tested at https://github.com/LYH-YF/MWPToolkit/blob/master/dataset/mawps/testset.json (238 samples)?

Hi @Yuan0320,

Correct! We will upload the test set later, or you can also get the test set from MWPToolkit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants