You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's hard to set a prefect 'max-out-len' for a task, for different models have different preference. When model A prefer to give the answer directly, models B may give final answer after long chain of thoughts.
A better way to detect the setting of 'max-out-len' is to run some examples on the tested model and see the average response length, as a specific model always have similar response lengths for the same task.
Still, there are some practiced conclusions:
For multiple choice question tasks like StoryCloze or MMLU, a length of 100 will satisfy the vast majority of cases.
For matematical problems like MATH and GSM8K, a length of 1024 may be better due to the model need a long reasoning process to get final answers.
For some subjective benchmarks like MTbench or Alpaca_eval, it also needs to be set to a very long length, as some questions will require the model to design a very detailed program or write a well-developed code script.
Describe the feature
加载本地的hf模型之后,想测试storycloze_gen数据,但不知道max-out-len该设置多少,其他数据,比如mmlu也是不确定该参数应该多少,这个应该怎么确定呢,可以给每个任务一个默认的预测长度吗,合理的默认长度
Will you implement it?
The text was updated successfully, but these errors were encountered: