Problems I came across when I try to reprocude the results #37

ChaoGaoUCR · 2023-08-13T19:49:46Z

Dear Authors,

Thanks for these great projects and your kind help.
I try to reproduce all the results in the Table,
But I came across several issues, Could you please explain some possible problems?
1. When I tried to Tune the model, I found the function "generate_prompt" in both finetune.py and evaluation.py can't extract data from the JSON file which tile is not "input, instruction, output, answer", So I changed the JSON file all the input name, I was wondering whether I am doing the right Jobs. Here are two examples I used.
Original One in Github Repo:

The One I modified:

2.
I can't get an answer which even close to the right Label Since I wasn't working in the ML area before, all the metrics are new to me, But the tuned model gives some results which sound ridiculous to me, I wondered if I did something wrong, or is there any other new Metrics I should use to reproduce the Tune-model Score in The GitHub repo?
I attached some results I got from my LoRA-Tuned model:

BTW, When I switch the test datasets to train datasets, the accuracy get higher, but still not the same as Table list.
I wondered if you can share your tuning setting if possible.

HZQ950419 · 2023-08-18T08:05:26Z

Hi,

If you want to reproduce all the results in the table, you can just train and evaluate with the given command. For example, to train LLaMA-7b-LoRA, you can use CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'math_10k.json' --output_dir './trained_models/llama-7b-lora-math/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora

For evaluation on SVAMP as example:
CUDA_VISIBLE_DEVICES=0 python evaluate.py \ --model LLaMA-7B \ --adapter LoRA \ --dataset SVAMP \ --base_model 'yahma/llama-7b-hf' \ --lora_weights "./trained_models/llama-7b-lora-math"

If you have any questions, please let use know!

lucasliunju · 2024-04-09T03:59:37Z

Hi @HZQ950419

Thanks for your great work, I also have a problem when I try to evaluate the fine-tuned model with lora. I find the main reason is that the output of response is none, for example:

The output on BoolQ is: 

outputs:  ['Below is an instruction that describes a task. Write a response that appropriately completes the request. \n\n                ### Instruction:\n                Please answer the following question with true or false, question: do runs have to be the same suit in gin rummy?\n\nAnswer format: true/false\n\n                ### Response:\n                ']
output:  
Please answer the following question with true or false, question: do runs have to be the same suit in gin rummy?

Answer format: true/false

prediction: 
label: true
---------------
test:2637/3270 | accuracy 0  0.0
---------------

ZeguanXiao · 2024-05-06T05:33:00Z

Same problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems I came across when I try to reprocude the results #37

Problems I came across when I try to reprocude the results #37

ChaoGaoUCR commented Aug 13, 2023 •

edited

HZQ950419 commented Aug 18, 2023

lucasliunju commented Apr 9, 2024

ZeguanXiao commented May 6, 2024 •

edited

Problems I came across when I try to reprocude the results #37

Problems I came across when I try to reprocude the results #37

Comments

ChaoGaoUCR commented Aug 13, 2023 • edited

HZQ950419 commented Aug 18, 2023

lucasliunju commented Apr 9, 2024

ZeguanXiao commented May 6, 2024 • edited

ChaoGaoUCR commented Aug 13, 2023 •

edited

ZeguanXiao commented May 6, 2024 •

edited