FT with bottleneck : cannot perform fine-tuning on purely quantized models #57

Lao-yy · 2024-02-26T08:24:29Z

Hi! I'm tried to finetune llama-2-13b with bottleneck Adapter, but it got a ValueError that cannot finetune the model loading by using load_in8bit. What is the problem? How can I solve it?

ValueError: You cannot perform fine-tuning on purely quantized models. Please attach trainable adapters on top of the quantized model to correctly perform fine-tuning. Please see: https://huggingface.co/docs/transformers/peft for more details

The package versions I'm using are as follows:
accelerate 0.27.2
bitsandbytes 0.41.2.post2
black 23.11.0
transformers 4.39.0.dev0
torch 2.1.1
gradio 4.7.1

The peftModel was constructed as follows. I think it was loaded in 8bit correctly.

---------model structure---------
PeftModelForCausalLM(
(base_model): BottleneckModel(
(model): LlamaForCausalLM(
(model): LlamaModel(
(embed_tokens): Embedding(32000, 5120)
(layers): ModuleList(
(0-39): 40 x LlamaDecoderLayer(
(self_attn): LlamaSdpaAttention(
(q_proj): Linear8bitLt(in_features=5120, out_features=5120, bias=False)
(k_proj): Linear8bitLt(in_features=5120, out_features=5120, bias=False)
(v_proj): Linear8bitLt(in_features=5120, out_features=5120, bias=False)
(o_proj): Linear8bitLt(in_features=5120, out_features=5120, bias=False)
(rotary_emb): LlamaRotaryEmbedding()
)
(mlp): LlamaMLP(
(gate_proj): Linear8bitLt(
in_features=5120, out_features=5120, bias=False
(adapter_down): Linear(in_features=5120, out_features=256, bias=False)
(adapter_up): Linear(in_features=256, out_features=5120, bias=False)
(act_fn): Tanh()
)
(up_proj): Linear8bitLt(
in_features=5120, out_features=5120, bias=False
(adapter_down): Linear(in_features=5120, out_features=256, bias=False)
(adapter_up): Linear(in_features=256, out_features=5120, bias=False)
(act_fn): Tanh()
)
(down_proj): Linear8bitLt(
in_features=5120, out_features=5120, bias=False
(adapter_down): Linear(in_features=5120, out_features=256, bias=False)
(adapter_up): Linear(in_features=256, out_features=5120, bias=False)
(act_fn): Tanh()
)
(act_fn): SiLU()
)
(input_layernorm): LlamaRMSNorm()
(post_attention_layernorm): LlamaRMSNorm()
)
)
(norm): LlamaRMSNorm()
)
(lm_head): CastOutputToFloat(
(0): Linear(in_features=5120, out_features=32000, bias=False)
)
)
)
)

HZQ950419 · 2024-02-26T13:36:39Z

Hi,

Please refer to #55. Let us know if it is helpful to solve your issue! Thanks!

Lao-yy · 2024-03-01T01:45:50Z

I tried to install your package and it worked. But it's falied that using Trainer to load_best_model at final. It got the error that version of PEFT (0.3.0) is not compatible to transformer(4.34.1).
Then, while I updated the PEFT(0.6.0), it got another error as follows:

ImportError: cannot import name 'inject_adapter_in_model' from 'peft' (/workingdir/peft/llama2-ft/LLM-Adapters/peft/src/peft/init.py)

Do you know anything about this ??

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FT with bottleneck : cannot perform fine-tuning on purely quantized models #57

FT with bottleneck : cannot perform fine-tuning on purely quantized models #57

Lao-yy commented Feb 26, 2024

HZQ950419 commented Feb 26, 2024

Lao-yy commented Mar 1, 2024 •

edited

FT with bottleneck : cannot perform fine-tuning on purely quantized models #57

FT with bottleneck : cannot perform fine-tuning on purely quantized models #57

Comments

Lao-yy commented Feb 26, 2024

HZQ950419 commented Feb 26, 2024

Lao-yy commented Mar 1, 2024 • edited

Lao-yy commented Mar 1, 2024 •

edited