Question about the reproducation of the results in the math_10k #58

zeyuliu1037 · 2024-02-29T22:42:38Z

Hi, thank you for your awesome work!

I have one question about the training on the math_10k dataset.
python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'ft-training_set/math_10k.json' --output_dir './trained_models/llama-7b-lora-math/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64

But I only get 16.14 on AQuA and 46.9 on SVAMP, but in the table it should be 18.9 on AQuA and 52.1 on SVAMP.
I'm using the peft library from the GitHub repo. Do you have any insights on this? I also noticed that even with "load_best_model_at_end=True", it seems that the best model is not loaded at the end, and the final eval_loss is still the loss of the last model based on the output from wandb. Is this correct?

Thank you so much in advance.

The text was updated successfully, but these errors were encountered:

HZQ950419 · 2024-03-01T01:45:35Z

Hi,

Can I ask if you used multi-gpu for training? If yes, please try with single GPU.

zeyuliu1037 · 2024-03-01T02:17:15Z

I use a single GPU.

Zhenyu001225 · 2024-04-09T16:02:21Z

Hi, thank you for your awesome work!

I have one question about the training on the math_10k dataset. python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'ft-training_set/math_10k.json' --output_dir './trained_models/llama-7b-lora-math/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64

But I only get 16.14 on AQuA and 46.9 on SVAMP, but in the table it should be 18.9 on AQuA and 52.1 on SVAMP. I'm using the peft library from the GitHub repo. Do you have any insights on this? I also noticed that even with "load_best_model_at_end=True", it seems that the best model is not loaded at the end, and the final eval_loss is still the loss of the last model based on the output from wandb. Is this correct?

Thank you so much in advance.

Hi, did you solve this problem? My results are close to yours.

zeyuliu1037 · 2024-04-12T04:26:01Z

Hi, did you solve this problem? My results are close to yours.

Unfortunately, I haven't made it yet.

Zhenyu001225 · 2024-04-12T14:02:25Z

Hi, did you solve this problem? My results are close to yours.

Unfortunately, I haven't made it yet.

You can use transformers==4.35.0 These results will be close to authors

zeyuliu1037 · 2024-04-12T21:55:37Z

Thank you so much!!!

Aradhye2002 · 2024-04-13T06:57:40Z

@Zhenyu001225 any idea why this happens? An extreme case is for transformers 4.40.0 which gave me gibberish output as mentioned in this issue.

Thanks

Zhenyu001225 · 2024-04-13T15:30:12Z

@Zhenyu001225 any idea why this happens? An extreme case is for transformers 4.40.0 which gave me gibberish output as mentioned in this issue.

Thanks

I think it's because of the tokenizer version.
For math, you can try:

CUDA_VISIBLE_DEVICES=1 python finetune.py
--base_model 'yahma/llama-7b-hf'
--data_path './ft-training_set/math_10k.json'
--output_dir './trained_models/llama-7b-lora-math/'
--batch_size 16
--micro_batch_size 4
--num_epochs 3
--learning_rate 3e-4
--cutoff_len 256
--val_set_size 0
--eval_step 80
--save_step 80
--adapter_name lora
--target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]'
--lora_r 32
--lora_alpha 64

For Commense:

CUDA_VISIBLE_DEVICES=8 python finetune.py
--base_model 'yahma/llama-7b-hf'
--data_path 'ft-training_set/commonsense_170k.json'
--output_dir './trained_models/llama-7b-lora-commonsense/'
--batch_size 16
--micro_batch_size 4
--num_epochs 3
--learning_rate 3e-4
--cutoff_len 256
--val_set_size 120
--eval_step 80
--save_step 80
--adapter_name lora
--target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]'
--lora_r 32
--lora_alpha 64

zeyuliu1037 · 2024-05-02T17:47:24Z

@Zhenyu001225 any idea why this happens? An extreme case is for transformers 4.40.0 which gave me gibberish output as mentioned in this issue.
Thanks

I think it's because of the tokenizer version. For math, you can try:

CUDA_VISIBLE_DEVICES=1 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path './ft-training_set/math_10k.json' --output_dir './trained_models/llama-7b-lora-math/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 0 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64

For Commense:

CUDA_VISIBLE_DEVICES=8 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'ft-training_set/commonsense_170k.json' --output_dir './trained_models/llama-7b-lora-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64

Hi, can you kindly share your requirement.txt with versions? I think besides the version of transformers, the versions of accelerate and tokenizers also affect the results. Thank you so much!

ZeguanXiao · 2024-05-07T11:49:23Z

@Zhenyu001225 When switching to transformers 4.35.0, the training is very unstable as training loss goes to 0 and validation loss goes to nan. Do you have the same problem?

YYing0111 · 2024-05-31T13:05:59Z

@Zhenyu001225 When switching to transformers 4.35.0, the training is very unstable as training loss goes to 0 and validation loss goes to nan. Do you have the same problem?

Hi, I have the same problem. Did you solve it?

ZeguanXiao · 2024-05-31T14:46:04Z

@YYing0111 Try installing transformers with
git+https://github.com/yizhongw/transformers.git@left_padding

pkhanna7 · 2024-06-02T17:42:12Z

Hi, I finetuned the Llama-7B model using LoRA with math_10k on a single A100 GPU with transformers==4.35.0, but still got a much lower accuracy (27.2%) on SVAMP compared to the reported numbers (52.1%).

From a manual analysis of the generated responses, it seems that the model is generating a lot of irrelevant code after finishing its reasoning steps. The final answer for math datasets is fetched using whatever is the last float number present in the response, however with some random code at the end, it fetches the numeric answer from the gibberish text instead of the actual answer, resulting in a drop in the accuracy.

Here's an example:

Add the number of pages of math homework and reading homework: 5 + 2 = 7

Subtract the number of pages of reading homework from the number of pages of math homework: 5 - 2 = 3

The difference between the number of pages of math homework and reading homework is 3.

Therefore, Rachel had 3 more pages of math homework than reading homework. The answer in Arabic numerals is 3.
</s><s>Tags: javascript, jquery, html, css, twitter-bootstrap

Question: How to add a class to an element when another element's value changes?

I'm trying to add a class to an element when another element's value changes. Here's an example of what I'm trying to do:

\begin{code}
input type="text" class="form-control" id="amount" value="100"
input type="text" class="form-control" id="amount" value="200"
input type="text" class="form-control" id="amount" value="300"
prediction: 300.0
label: 3.0

Here it treats 300 as the answer since thats the last number in the generated response, while the actual reasoning by Llama is correct in the first half of the generation. Anyone knows how to fix this? Thanks!

Edit: Also here's my ft command:
CUDA_VISIBLE_DEVICES=7 python finetune.py > finetune_llama7_singlegpu_old_transformers.txt --base_model 'yahma/llama-7b-hf' --data_path 'ft-training_set/math_10k.json' --output_dir './trained_models/llama-7b-lora-math-single-gpu-old-transformers/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64

zeyuliu1037 closed this as completed Mar 1, 2024

zeyuliu1037 reopened this Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the reproducation of the results in the math_10k #58

Question about the reproducation of the results in the math_10k #58

zeyuliu1037 commented Feb 29, 2024

HZQ950419 commented Mar 1, 2024

zeyuliu1037 commented Mar 1, 2024

Zhenyu001225 commented Apr 9, 2024

zeyuliu1037 commented Apr 12, 2024

Zhenyu001225 commented Apr 12, 2024

zeyuliu1037 commented Apr 12, 2024

Aradhye2002 commented Apr 13, 2024

Zhenyu001225 commented Apr 13, 2024

zeyuliu1037 commented May 2, 2024

ZeguanXiao commented May 7, 2024 •

edited

YYing0111 commented May 31, 2024

ZeguanXiao commented May 31, 2024 •

edited

pkhanna7 commented Jun 2, 2024 •

edited

Question about the reproducation of the results in the math_10k #58

Question about the reproducation of the results in the math_10k #58

Comments

zeyuliu1037 commented Feb 29, 2024

HZQ950419 commented Mar 1, 2024

zeyuliu1037 commented Mar 1, 2024

Zhenyu001225 commented Apr 9, 2024

zeyuliu1037 commented Apr 12, 2024

Zhenyu001225 commented Apr 12, 2024

zeyuliu1037 commented Apr 12, 2024

Aradhye2002 commented Apr 13, 2024

Zhenyu001225 commented Apr 13, 2024

zeyuliu1037 commented May 2, 2024

ZeguanXiao commented May 7, 2024 • edited

YYing0111 commented May 31, 2024

ZeguanXiao commented May 31, 2024 • edited

pkhanna7 commented Jun 2, 2024 • edited

ZeguanXiao commented May 7, 2024 •

edited

ZeguanXiao commented May 31, 2024 •

edited

pkhanna7 commented Jun 2, 2024 •

edited