Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Couldn't get the same accuracy with eight commonsense reasoning datasets. #38

Open
ello0211 opened this issue Sep 13, 2023 · 7 comments

Comments

@ello0211
Copy link

Hi,thanks for your great work!
When I try to reproduce the results with commonssense reasoning datasets, it turns out to be not good as the table. The set I use is the same as the math resoning tasks showen in the readme.could you tell me if I use the right set or could you show me the right way to reproduce the same accuracy as the table.
Thank you so much!

@HZQ950419
Copy link
Collaborator

HZQ950419 commented Sep 13, 2023

Hi,

The set is a little bit different. I listed the commands below.
For LoRA:
CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-lora-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64

For Series Adapter:
CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-bottleneck-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --target_modules '["down_proj"]'

For Parallel Adapter:
CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-parallel-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --use_parallel_adapter --target_modules '["up_proj", "down_proj"]'

@ello0211
Copy link
Author

ok!I will try these later, thanks a lot

@nbasyl
Copy link

nbasyl commented Nov 17, 2023

@ello0211 Hi, did you manage to get the same result as the table reported? Thx!

@ello0211
Copy link
Author

Sorry, I didn't conduct the experiment exactly according to the parameters you provided. However, I used LoRa with q and v, taking r=4, and obtained slightly inferior results. By the way, it seems that configuring LoRa as you suggested would result in a large number of parameters, right?

@HZQ950419
Copy link
Collaborator

Sorry, I didn't conduct the experiment exactly according to the parameters you provided. However, I used LoRa with q and v, taking r=4, and obtained slightly inferior results. By the way, it seems that configuring LoRa as you suggested would result in a large number of parameters, right?

Hi, with r=32, the number of LoRA parameters should be 8 times of r=4.

@ls559
Copy link

ls559 commented Dec 18, 2023

Hi,

The set is a little bit different. I listed the commands below. For LoRA: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-lora-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64

For Series Adapter: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-bottleneck-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --target_modules '["down_proj"]'

For Parallel Adapter: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-parallel-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --use_parallel_adapter --target_modules '["up_proj", "down_proj"]'

@HZQ950419 I finetuned commonsense_170k.json based on LORA refer to your script, only change eval_step and save_step:

CUDA_VISIBLE_DEVICES=0 python finetune.py \
        --base_model 'yahma/llama-7b-hf'  \
        --data_path 'commonsense_170k.json'   \
        --output_dir $output_path   \
        --batch_size 16  \
        --micro_batch_size 4   \
        --num_epochs 3   \
        --learning_rate 3e-4   \
        --cutoff_len 256   \
        --val_set_size 120 \
        --eval_step 80 \
        --save_step 80  \
        --adapter_name lora \
        --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' \
        --lora_r 32 \
        --lora_alpha 64 

And evaluated by this script:

CUDA_VISIBLE_DEVICES=0 python commonsense_evaluate.py \
    --model LLaMA-7B \
    --adapter LoRA \
    --dataset $dataset \
    --batch_size 4 \
    --base_model 'yahma/llama-7b-hf' \
    --lora_weights $weights_path 

But still couldn't reproduce the same accuracy as the table. For boolq, only 0.6715 accuracy, and 0.3884 for piqa. Can you help me check the problem.
Meanwhile, if I want to reproduce the results of llama13 on commonsense_170k.json, how to set the parameters.
Thank you so much !

@HZQ950419
Copy link
Collaborator

Hi,
The set is a little bit different. I listed the commands below. For LoRA: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-lora-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64
For Series Adapter: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-bottleneck-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --target_modules '["down_proj"]'
For Parallel Adapter: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-parallel-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --use_parallel_adapter --target_modules '["up_proj", "down_proj"]'

@HZQ950419 I finetuned commonsense_170k.json based on LORA refer to your script, only change eval_step and save_step:

CUDA_VISIBLE_DEVICES=0 python finetune.py \
        --base_model 'yahma/llama-7b-hf'  \
        --data_path 'commonsense_170k.json'   \
        --output_dir $output_path   \
        --batch_size 16  \
        --micro_batch_size 4   \
        --num_epochs 3   \
        --learning_rate 3e-4   \
        --cutoff_len 256   \
        --val_set_size 120 \
        --eval_step 80 \
        --save_step 80  \
        --adapter_name lora \
        --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' \
        --lora_r 32 \
        --lora_alpha 64 

And evaluated by this script:

CUDA_VISIBLE_DEVICES=0 python commonsense_evaluate.py \
    --model LLaMA-7B \
    --adapter LoRA \
    --dataset $dataset \
    --batch_size 4 \
    --base_model 'yahma/llama-7b-hf' \
    --lora_weights $weights_path 

But still couldn't reproduce the same accuracy as the table. For boolq, only 0.6715 accuracy, and 0.3884 for piqa. Can you help me check the problem. Meanwhile, if I want to reproduce the results of llama13 on commonsense_170k.json, how to set the parameters. Thank you so much !

Hi, the command is the same as the one we use. Are you using multi-gpu for fine-tuning? Maybe you can try to use single GPU for fine-tuning, as there are some other researchers can't reproduce the results with multi-gpu training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants