We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I'm evaluating with the officially supported tasks/models/datasets.
python -c "import opencompass.utils;import pprint;pprint.pprint(dict(opencompass.utils.collect_env()))"
CUDA_VISIBLE_DEVICES="0,1,2,3" python run.py --datasets cmb_gen_dfb5c4 --hf-path "/Qwen1.5-72B-Chat/" --model-kwargs device_map='auto' trust_remote_code=True --tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True --max-out-len 100 --max-seq-len 2048 --batch-size1 --num-gpus 4
CMB + qwen1.5-72b-chat: test acc=0.26%, almost all the predictions in cmb_test_k.json are "" (no answers). examples: "21": { "origin_prompt": "以下是中国医师考试中规培结业考试的一道多项选择题,不需要做任何分析和解释,直接输出答案选项。\n每一精神症状均有明确定义,并具有以下特点\nA. 症状的出现不受病人意识控制\nB. 症状出现可受病人意识控制\nC. 症状可以通过转移的方法使其消失\nD. 症状内容与周围环境不相称\nE. 症状给病人带来不同程度的功能损害 \n 答案: ", "prediction": "", "gold": "NULL" }, "22": { "origin_prompt": "以下是中国医师考试中规培结业考试的一道单项选择题,不需要做任何分析和解释,直接输出答案选项。\n关于慢性粒细胞白血病,错误的是\nA. 造血干细胞恶性克隆性疾病\nB. 自然病程仅数月\nC. 分为慢性期、加速期和急变期\nD. 最显著的体征是脾大\nE. 血象白细胞持续增高 \n 答案: ", "prediction": "", "gold": "NULL" }, "23": { "origin_prompt": "以下是中国医师考试中规培结业考试的一道单项选择题,不需要做任何分析和解释,直接输出答案选项。\n确定颌位关系包括\nA. 定位平面记录\nB. 下颌后退记录\nC. 面下1/3高度记录\nD. 垂直距离和下颌前伸(牙合)记录\nE. 垂直距离和正中关系记录 \n 答案: ", "prediction": "", "gold": "NULL" },
While CMB + qwen1.5-32b-chat is normal with an acc around 52%
No response
The text was updated successfully, but these errors were encountered:
You can try to set do_sample = True in model's generation_kwargs and see whether have differences.
do_sample = True
Sorry, something went wrong.
tonysy
No branches or pull requests
Prerequisite
Type
I'm evaluating with the officially supported tasks/models/datasets.
Environment
python -c "import opencompass.utils;import pprint;pprint.pprint(dict(opencompass.utils.collect_env()))"
Reproduces the problem - code/configuration sample
CUDA_VISIBLE_DEVICES="0,1,2,3" python run.py
--datasets cmb_gen_dfb5c4
--hf-path "/Qwen1.5-72B-Chat/"
--model-kwargs device_map='auto' trust_remote_code=True
--tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True
--max-out-len 100
--max-seq-len 2048
--batch-size1
--num-gpus 4
Reproduces the problem - command or script
CUDA_VISIBLE_DEVICES="0,1,2,3" python run.py
--datasets cmb_gen_dfb5c4
--hf-path "/Qwen1.5-72B-Chat/"
--model-kwargs device_map='auto' trust_remote_code=True
--tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True
--max-out-len 100
--max-seq-len 2048
--batch-size1
--num-gpus 4
Reproduces the problem - error message
CMB + qwen1.5-72b-chat: test acc=0.26%, almost all the predictions in cmb_test_k.json are "" (no answers).
examples:
"21": {
"origin_prompt": "以下是中国医师考试中规培结业考试的一道多项选择题,不需要做任何分析和解释,直接输出答案选项。\n每一精神症状均有明确定义,并具有以下特点\nA. 症状的出现不受病人意识控制\nB. 症状出现可受病人意识控制\nC. 症状可以通过转移的方法使其消失\nD. 症状内容与周围环境不相称\nE. 症状给病人带来不同程度的功能损害 \n 答案: ",
"prediction": "",
"gold": "NULL"
},
"22": {
"origin_prompt": "以下是中国医师考试中规培结业考试的一道单项选择题,不需要做任何分析和解释,直接输出答案选项。\n关于慢性粒细胞白血病,错误的是\nA. 造血干细胞恶性克隆性疾病\nB. 自然病程仅数月\nC. 分为慢性期、加速期和急变期\nD. 最显著的体征是脾大\nE. 血象白细胞持续增高 \n 答案: ",
"prediction": "",
"gold": "NULL"
},
"23": {
"origin_prompt": "以下是中国医师考试中规培结业考试的一道单项选择题,不需要做任何分析和解释,直接输出答案选项。\n确定颌位关系包括\nA. 定位平面记录\nB. 下颌后退记录\nC. 面下1/3高度记录\nD. 垂直距离和下颌前伸(牙合)记录\nE. 垂直距离和正中关系记录 \n 答案: ",
"prediction": "",
"gold": "NULL"
},
While CMB + qwen1.5-32b-chat is normal with an acc around 52%
Other information
No response
The text was updated successfully, but these errors were encountered: