You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey! Post any questions or complaints on the dataset. We'll log our internal goals and limitations here too.
It was pointed out by Rishabh Agarwal that the PRM Math subset has two structural issues. 1) we added newlines to the human reference answers (debatably could be called a bug). 2) with GPT4 always as rejected, some models may be biased there.
The text was updated successfully, but these errors were encountered:
Idea: Now that we have a bunch of RMs, we can see if there are any datapoints that the models all think are wrong and double check our labels for future releases.
Hey! Post any questions or complaints on the dataset. We'll log our internal goals and limitations here too.
The text was updated successfully, but these errors were encountered: