Error when trying to run with a Quantized base model #138

ihavework7 · 2023-12-15T05:34:54Z

Hello. I have been trying to run the multi task llama7b models with Bloke's llama 7b GPTQ(https://huggingface.co/TheBloke/Llama-2-7B-GPTQ) as the base.

def load_model(base_model, peft_model, from_remote=True):

    model_name = parse_model_name(base_model, from_remote)
    # model = AutoModelForCausalLM.from_pretrained(
    #     model_name, trust_remote_code=True,
    #     device_map="auto",
    # )
    model_name_or_path = "TheBloke/Llama-2-7b-Chat-GPTQ"
    model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename="model",
        use_safetensors=True,
        trust_remote_code=True,
        device="cuda:0",
        use_triton="False")
    model.model_parallel = True

    tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

    model = PeftModel.from_pretrained(model, peft_model)
    model = model.eval()
    return model, tokenizer

While running the same in google colab, I get the error when trying to load PEFT from pre trained:

ValueError: Target modules ['q_proj', 'k_proj', 'v_proj'] not found in the base model. Please check the target modules and try again.

After a bit of searching, It says I'll have to re train the PEFT model by using a different config. Is there anything I can do? (other than training)

For debugging purposes, value of 'model' before PEFT is used:

LlamaGPTQForCausalLM(
  (model): LlamaForCausalLM(
    (model): LlamaModel(
      (embed_tokens): Embedding(32000, 4096, padding_idx=0)
      (layers): ModuleList(
        (0-31): 32 x LlamaDecoderLayer(
          (self_attn): FusedLlamaAttentionForQuantizedModel(
            (qkv_proj): GeneralQuantLinear(in_features=4096, out_features=12288, bias=True)
            (o_proj): GeneralQuantLinear(in_features=4096, out_features=4096, bias=True)
            (rotary_emb): LlamaRotaryEmbedding()
          )
          (mlp): FusedLlamaMLPForQuantizedModel(
            (gate_proj): GeneralQuantLinear(in_features=4096, out_features=11008, bias=True)
            (up_proj): GeneralQuantLinear(in_features=4096, out_features=11008, bias=True)
            (down_proj): GeneralQuantLinear(in_features=11008, out_features=4096, bias=True)
          )
          (input_layernorm): LlamaRMSNorm()
          (post_attention_layernorm): LlamaRMSNorm()
        )
      )
      (norm): LlamaRMSNorm()
    )
    (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
  )
)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when trying to run with a Quantized base model #138

Error when trying to run with a Quantized base model #138

ihavework7 commented Dec 15, 2023

Error when trying to run with a Quantized base model #138

Error when trying to run with a Quantized base model #138

Comments

ihavework7 commented Dec 15, 2023