Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems I came across when I try to reprocude the results #37

Open
ChaoGaoUCR opened this issue Aug 13, 2023 · 3 comments
Open

Problems I came across when I try to reprocude the results #37

ChaoGaoUCR opened this issue Aug 13, 2023 · 3 comments

Comments

@ChaoGaoUCR
Copy link

ChaoGaoUCR commented Aug 13, 2023

Dear Authors,

Thanks for these great projects and your kind help.
I try to reproduce all the results in the Table,
But I came across several issues, Could you please explain some possible problems?
1. When I tried to Tune the model, I found the function "generate_prompt" in both finetune.py and evaluation.py can't extract data from the JSON file which tile is not "input, instruction, output, answer", So I changed the JSON file all the input name, I was wondering whether I am doing the right Jobs. Here are two examples I used.
Original One in Github Repo:
image
The One I modified:
image
2.
I can't get an answer which even close to the right Label Since I wasn't working in the ML area before, all the metrics are new to me, But the tuned model gives some results which sound ridiculous to me, I wondered if I did something wrong, or is there any other new Metrics I should use to reproduce the Tune-model Score in The GitHub repo?
I attached some results I got from my LoRA-Tuned model:
image
BTW, When I switch the test datasets to train datasets, the accuracy get higher, but still not the same as Table list.
I wondered if you can share your tuning setting if possible.

@HZQ950419
Copy link
Collaborator

Hi,

If you want to reproduce all the results in the table, you can just train and evaluate with the given command. For example, to train LLaMA-7b-LoRA, you can use CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'math_10k.json' --output_dir './trained_models/llama-7b-lora-math/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora

For evaluation on SVAMP as example:
CUDA_VISIBLE_DEVICES=0 python evaluate.py \ --model LLaMA-7B \ --adapter LoRA \ --dataset SVAMP \ --base_model 'yahma/llama-7b-hf' \ --lora_weights "./trained_models/llama-7b-lora-math"

If you have any questions, please let use know!

@lucasliunju
Copy link

Hi @HZQ950419

Thanks for your great work, I also have a problem when I try to evaluate the fine-tuned model with lora. I find the main reason is that the output of response is none, for example:

The output on BoolQ is: 

outputs:  ['Below is an instruction that describes a task. Write a response that appropriately completes the request. \n\n                ### Instruction:\n                Please answer the following question with true or false, question: do runs have to be the same suit in gin rummy?\n\nAnswer format: true/false\n\n                ### Response:\n                ']
output:  
Please answer the following question with true or false, question: do runs have to be the same suit in gin rummy?

Answer format: true/false

prediction: 
label: true
---------------
test:2637/3270 | accuracy 0  0.0
---------------

@ZeguanXiao
Copy link

ZeguanXiao commented May 6, 2024

Same problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants