Reproduce the commense results on Boolq #64

Zhenyu001225 · 2024-04-09T14:53:53Z

When I'm doing the evaluation, should I use --load_8bit? I'm trying to reproduce the results of LLaMa-7B-LoRA

Finetune:
CUDA_VISIBLE_DEVICES=8 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path './ft-training_set/commonsense_170k.json' --output_dir './trained_models/llama-7b-lora-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64

Evaluate:
CUDA_VISIBLE_DEVICES=3 python commonsense_evaluate.py
--model LLaMA-7B
--adapter LoRA
--dataset boolq
--batch_size 1
--base_model 'yahma/llama-7b-hf'
--lora_weights './trained_models/llama-7b-lora-commonsense/'

But the result is only 57.5 compared with the table 68.9..
Could you provide me with some insights here?

The text was updated successfully, but these errors were encountered:

Zhenyu001225 · 2024-04-09T15:20:44Z

And for PIQA the result is 74.6 compared with data in table 80.7.
For Siqa the result is 60.8 compared with data in table 77.4
Should I finetune again? Or adjusting any of the hypermeters

lucasliunju · 2024-04-16T03:05:25Z

Hi May I ask whether you solve this issue now?

wutaiqiang · 2024-04-16T05:34:48Z

btw, I find that a larger batch size would lead to some bad output while bsz=1 not.

lucasliunju · 2024-04-16T10:28:11Z

@wutaiqiang Yes, I also find this problem and bsz=1 can solve the most case, it can still output BAD result for some case.

wutaiqiang · 2024-04-16T10:45:43Z

In my case, the results are even better than reported. You should use one GPU in finetuning.

wutaiqiang · 2024-04-16T10:48:20Z

69.44 | 80.79 | 79.32 | 84.2 | 81.61 | 80.34 | 64.93 | 76.8

wutaiqiang · 2024-04-16T10:48:37Z

For llama 7B + lora

lucasliunju · 2024-04-16T11:03:13Z

Hi @wutaiqiang Thanks for your data point. I try to change the base model from "float16" to "float32" or "bfloat16" and I find the output result is not very stable.

Zhenyu001225 · 2024-04-16T13:57:02Z

Hi May I ask whether you solve this issue now?

Hi, I change the version of transformers to 4.35.0 and when doing evaluation batch_size=1.

Now the results are :

Model	Gsm8k	SVAMP	AuQA	MultiArith	SingleEq	AddSub
LLama-7B-LoRA-math	37.9	47.0	19.68	97.5	85.83	83.54

Model	BoolQ	SiQA	SIQA	Hellaswag	Winogrande	ARC-c	ARC-e	OpenBookQA	Average
LLama-7B-LoRA-Commense	64.01	80.25	77.28	76.50	79.79	62.54	77.31	77.4	74.39

Zhenyu001225 · 2024-04-16T13:58:03Z

For llama 7B + lora

Hi, what is the version of transformers in your case?

wutaiqiang · 2024-04-16T14:08:24Z

4.32.1

Zhenyu001225 · 2024-04-16T14:30:52Z

4.32.1

Thank you so much~ I'll try again

clarenceluo78 · 2024-04-16T23:00:21Z

69.44 | 80.79 | 79.32 | 84.2 | 81.61 | 80.34 | 64.93 | 76.8

boolq | piqa | social_i_qa | hellaswag | winogrande | ARC-Easy | ARC-Challenge | openbookqa

Hi there, I want to ask if you use the 8bit quantiztation when reproduce?

Zhenyu001225 · 2024-04-17T00:00:53Z

69.44 | 80.79 | 79.32 | 84.2 | 81.61 | 80.34 | 64.93 | 76.8
boolq | piqa | social_i_qa | hellaswag | winogrande | ARC-Easy | ARC-Challenge | openbookqa

Hi there, I want to ask if you use the 8bit quantiztation when reproduce?

I didn't open the 8-bit quantization

wutaiqiang · 2024-04-17T01:24:07Z

After rerun, the results are

68.13	80.3	78.45	83.11	80.66	77.23	65.78	79.4
boolq	piqa	social_i_qa	hellaswag	winogrande	ARC-Easy	ARC-Challenge	openbookqa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduce the commense results on Boolq #64

Reproduce the commense results on Boolq #64

Zhenyu001225 commented Apr 9, 2024 •

edited

Zhenyu001225 commented Apr 9, 2024

lucasliunju commented Apr 16, 2024

wutaiqiang commented Apr 16, 2024

lucasliunju commented Apr 16, 2024

wutaiqiang commented Apr 16, 2024

wutaiqiang commented Apr 16, 2024

wutaiqiang commented Apr 16, 2024

lucasliunju commented Apr 16, 2024

Zhenyu001225 commented Apr 16, 2024

Zhenyu001225 commented Apr 16, 2024

wutaiqiang commented Apr 16, 2024

Zhenyu001225 commented Apr 16, 2024

clarenceluo78 commented Apr 16, 2024

Zhenyu001225 commented Apr 17, 2024

wutaiqiang commented Apr 17, 2024

Reproduce the commense results on Boolq #64

Reproduce the commense results on Boolq #64

Comments

Zhenyu001225 commented Apr 9, 2024 • edited

Zhenyu001225 commented Apr 9, 2024

lucasliunju commented Apr 16, 2024

wutaiqiang commented Apr 16, 2024

lucasliunju commented Apr 16, 2024

wutaiqiang commented Apr 16, 2024

wutaiqiang commented Apr 16, 2024

wutaiqiang commented Apr 16, 2024

lucasliunju commented Apr 16, 2024

Zhenyu001225 commented Apr 16, 2024

Zhenyu001225 commented Apr 16, 2024

wutaiqiang commented Apr 16, 2024

Zhenyu001225 commented Apr 16, 2024

clarenceluo78 commented Apr 16, 2024

Zhenyu001225 commented Apr 17, 2024

wutaiqiang commented Apr 17, 2024

Zhenyu001225 commented Apr 9, 2024 •

edited