CMB + Qwen1.5-72B-Chat got empty answers #1141

qy1026 · 2024-05-11T08:11:26Z

Prerequisite

I have searched Issues and Discussions but cannot get the expected help.
The bug has not been fixed in the latest version.

Type

I'm evaluating with the officially supported tasks/models/datasets.

Environment

python -c "import opencompass.utils;import pprint;pprint.pprint(dict(opencompass.utils.collect_env()))"

Reproduces the problem - code/configuration sample

CUDA_VISIBLE_DEVICES="0,1,2,3" python run.py
--datasets cmb_gen_dfb5c4
--hf-path "/Qwen1.5-72B-Chat/"
--model-kwargs device_map='auto' trust_remote_code=True
--tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True
--max-out-len 100
--max-seq-len 2048
--batch-size1
--num-gpus 4

Reproduces the problem - command or script

CUDA_VISIBLE_DEVICES="0,1,2,3" python run.py
--datasets cmb_gen_dfb5c4
--hf-path "/Qwen1.5-72B-Chat/"
--model-kwargs device_map='auto' trust_remote_code=True
--tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True
--max-out-len 100
--max-seq-len 2048
--batch-size1
--num-gpus 4

Reproduces the problem - error message

CMB + qwen1.5-72b-chat: test acc=0.26%, almost all the predictions in cmb_test_k.json are "" (no answers).
examples:
"21": {
"origin_prompt": "以下是中国医师考试中规培结业考试的一道多项选择题，不需要做任何分析和解释，直接输出答案选项。\n每一精神症状均有明确定义，并具有以下特点\nA. 症状的出现不受病人意识控制\nB. 症状出现可受病人意识控制\nC. 症状可以通过转移的方法使其消失\nD. 症状内容与周围环境不相称\nE. 症状给病人带来不同程度的功能损害 \n 答案: ",
"prediction": "",
"gold": "NULL"
},
"22": {
"origin_prompt": "以下是中国医师考试中规培结业考试的一道单项选择题，不需要做任何分析和解释，直接输出答案选项。\n关于慢性粒细胞白血病，错误的是\nA. 造血干细胞恶性克隆性疾病\nB. 自然病程仅数月\nC. 分为慢性期、加速期和急变期\nD. 最显著的体征是脾大\nE. 血象白细胞持续增高 \n 答案: ",
"prediction": "",
"gold": "NULL"
},
"23": {
"origin_prompt": "以下是中国医师考试中规培结业考试的一道单项选择题，不需要做任何分析和解释，直接输出答案选项。\n确定颌位关系包括\nA. 定位平面记录\nB. 下颌后退记录\nC. 面下1/3高度记录\nD. 垂直距离和下颌前伸（牙合）记录\nE. 垂直距离和正中关系记录 \n 答案: ",
"prediction": "",
"gold": "NULL"
},

While CMB + qwen1.5-32b-chat is normal with an acc around 52%

Other information

No response

bittersweet1999 · 2024-05-15T02:47:19Z

You can try to set do_sample = True in model's generation_kwargs and see whether have differences.

mm-assistant bot assigned tonysy May 11, 2024

qy1026 changed the title ~~CMB + Qwen1.5-72B-Chat~~ CMB + Qwen1.5-72B-Chat got empty answers May 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CMB + Qwen1.5-72B-Chat got empty answers #1141

CMB + Qwen1.5-72B-Chat got empty answers #1141

qy1026 commented May 11, 2024 •

edited

bittersweet1999 commented May 15, 2024

CMB + Qwen1.5-72B-Chat got empty answers #1141

CMB + Qwen1.5-72B-Chat got empty answers #1141

Comments

qy1026 commented May 11, 2024 • edited

Prerequisite

Type

Environment

Reproduces the problem - code/configuration sample

Reproduces the problem - command or script

Reproduces the problem - error message

Other information

bittersweet1999 commented May 15, 2024

qy1026 commented May 11, 2024 •

edited