Couldn't get the same accuracy with eight commonsense reasoning datasets. #38

ello0211 · 2023-09-13T01:43:35Z

Hi,thanks for your great work!
When I try to reproduce the results with commonssense reasoning datasets, it turns out to be not good as the table. The set I use is the same as the math resoning tasks showen in the readme.could you tell me if I use the right set or could you show me the right way to reproduce the same accuracy as the table.
Thank you so much!

HZQ950419 · 2023-09-13T06:47:52Z

Hi,

The set is a little bit different. I listed the commands below.
For LoRA:
CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-lora-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64

For Series Adapter:
CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-bottleneck-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --target_modules '["down_proj"]'

For Parallel Adapter:
CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-parallel-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --use_parallel_adapter --target_modules '["up_proj", "down_proj"]'

ello0211 · 2023-09-13T07:18:49Z

ok!I will try these later, thanks a lot

nbasyl · 2023-11-17T06:36:33Z

@ello0211 Hi, did you manage to get the same result as the table reported? Thx!

ello0211 · 2023-11-27T10:31:59Z

Sorry, I didn't conduct the experiment exactly according to the parameters you provided. However, I used LoRa with q and v, taking r=4, and obtained slightly inferior results. By the way, it seems that configuring LoRa as you suggested would result in a large number of parameters, right?

HZQ950419 · 2023-12-07T05:16:35Z

Sorry, I didn't conduct the experiment exactly according to the parameters you provided. However, I used LoRa with q and v, taking r=4, and obtained slightly inferior results. By the way, it seems that configuring LoRa as you suggested would result in a large number of parameters, right?

Hi, with r=32, the number of LoRA parameters should be 8 times of r=4.

ls559 · 2023-12-18T06:40:02Z

Hi,

The set is a little bit different. I listed the commands below. For LoRA: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-lora-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64

For Series Adapter: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-bottleneck-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --target_modules '["down_proj"]'

For Parallel Adapter: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-parallel-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --use_parallel_adapter --target_modules '["up_proj", "down_proj"]'

@HZQ950419 I finetuned commonsense_170k.json based on LORA refer to your script, only change eval_step and save_step:

CUDA_VISIBLE_DEVICES=0 python finetune.py \
        --base_model 'yahma/llama-7b-hf'  \
        --data_path 'commonsense_170k.json'   \
        --output_dir $output_path   \
        --batch_size 16  \
        --micro_batch_size 4   \
        --num_epochs 3   \
        --learning_rate 3e-4   \
        --cutoff_len 256   \
        --val_set_size 120 \
        --eval_step 80 \
        --save_step 80  \
        --adapter_name lora \
        --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' \
        --lora_r 32 \
        --lora_alpha 64

And evaluated by this script:

CUDA_VISIBLE_DEVICES=0 python commonsense_evaluate.py \
    --model LLaMA-7B \
    --adapter LoRA \
    --dataset $dataset \
    --batch_size 4 \
    --base_model 'yahma/llama-7b-hf' \
    --lora_weights $weights_path

But still couldn't reproduce the same accuracy as the table. For boolq, only 0.6715 accuracy, and 0.3884 for piqa. Can you help me check the problem.
Meanwhile, if I want to reproduce the results of llama13 on commonsense_170k.json, how to set the parameters.
Thank you so much !

HZQ950419 · 2024-01-09T12:42:53Z

Hi,
The set is a little bit different. I listed the commands below. For LoRA: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-lora-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64
For Series Adapter: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-bottleneck-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --target_modules '["down_proj"]'
For Parallel Adapter: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-parallel-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --use_parallel_adapter --target_modules '["up_proj", "down_proj"]'

@HZQ950419 I finetuned commonsense_170k.json based on LORA refer to your script, only change eval_step and save_step:
CUDA_VISIBLE_DEVICES=0 python finetune.py \
        --base_model 'yahma/llama-7b-hf'  \
        --data_path 'commonsense_170k.json'   \
        --output_dir $output_path   \
        --batch_size 16  \
        --micro_batch_size 4   \
        --num_epochs 3   \
        --learning_rate 3e-4   \
        --cutoff_len 256   \
        --val_set_size 120 \
        --eval_step 80 \
        --save_step 80  \
        --adapter_name lora \
        --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' \
        --lora_r 32 \
        --lora_alpha 64 
And evaluated by this script:
CUDA_VISIBLE_DEVICES=0 python commonsense_evaluate.py \
    --model LLaMA-7B \
    --adapter LoRA \
    --dataset $dataset \
    --batch_size 4 \
    --base_model 'yahma/llama-7b-hf' \
    --lora_weights $weights_path 
But still couldn't reproduce the same accuracy as the table. For boolq, only 0.6715 accuracy, and 0.3884 for piqa. Can you help me check the problem. Meanwhile, if I want to reproduce the results of llama13 on commonsense_170k.json, how to set the parameters. Thank you so much !

Hi, the command is the same as the one we use. Are you using multi-gpu for fine-tuning? Maybe you can try to use single GPU for fine-tuning, as there are some other researchers can't reproduce the results with multi-gpu training.

HZQ950419 mentioned this issue Nov 19, 2023

Guidance Request for Reproducing OpenbookQA Dataset Results #49

Open

mchorton mentioned this issue Apr 2, 2024

Possible Bug In Handling Batch Size During Common Sense Evaluation #61

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Couldn't get the same accuracy with eight commonsense reasoning datasets. #38

Couldn't get the same accuracy with eight commonsense reasoning datasets. #38

ello0211 commented Sep 13, 2023

HZQ950419 commented Sep 13, 2023 •

edited

ello0211 commented Sep 13, 2023

nbasyl commented Nov 17, 2023

ello0211 commented Nov 27, 2023

HZQ950419 commented Dec 7, 2023

ls559 commented Dec 18, 2023 •

edited

HZQ950419 commented Jan 9, 2024

Couldn't get the same accuracy with eight commonsense reasoning datasets. #38

Couldn't get the same accuracy with eight commonsense reasoning datasets. #38

Comments

ello0211 commented Sep 13, 2023

HZQ950419 commented Sep 13, 2023 • edited

ello0211 commented Sep 13, 2023

nbasyl commented Nov 17, 2023

ello0211 commented Nov 27, 2023

HZQ950419 commented Dec 7, 2023

ls559 commented Dec 18, 2023 • edited

HZQ950419 commented Jan 9, 2024

HZQ950419 commented Sep 13, 2023 •

edited

ls559 commented Dec 18, 2023 •

edited