Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when trying to run with a Quantized base model #138

Open
ihavework7 opened this issue Dec 15, 2023 · 0 comments
Open

Error when trying to run with a Quantized base model #138

ihavework7 opened this issue Dec 15, 2023 · 0 comments

Comments

@ihavework7
Copy link

Hello. I have been trying to run the multi task llama7b models with Bloke's llama 7b GPTQ(https://huggingface.co/TheBloke/Llama-2-7B-GPTQ) as the base.

def load_model(base_model, peft_model, from_remote=True):

    model_name = parse_model_name(base_model, from_remote)
    # model = AutoModelForCausalLM.from_pretrained(
    #     model_name, trust_remote_code=True,
    #     device_map="auto",
    # )
    model_name_or_path = "TheBloke/Llama-2-7b-Chat-GPTQ"
    model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename="model",
        use_safetensors=True,
        trust_remote_code=True,
        device="cuda:0",
        use_triton="False")
    model.model_parallel = True

    tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

    model = PeftModel.from_pretrained(model, peft_model)
    model = model.eval()
    return model, tokenizer

While running the same in google colab, I get the error when trying to load PEFT from pre trained:

ValueError: Target modules ['q_proj', 'k_proj', 'v_proj'] not found in the base model. Please check the target modules and try again.

After a bit of searching, It says I'll have to re train the PEFT model by using a different config. Is there anything I can do? (other than training)

For debugging purposes, value of 'model' before PEFT is used:

LlamaGPTQForCausalLM(
  (model): LlamaForCausalLM(
    (model): LlamaModel(
      (embed_tokens): Embedding(32000, 4096, padding_idx=0)
      (layers): ModuleList(
        (0-31): 32 x LlamaDecoderLayer(
          (self_attn): FusedLlamaAttentionForQuantizedModel(
            (qkv_proj): GeneralQuantLinear(in_features=4096, out_features=12288, bias=True)
            (o_proj): GeneralQuantLinear(in_features=4096, out_features=4096, bias=True)
            (rotary_emb): LlamaRotaryEmbedding()
          )
          (mlp): FusedLlamaMLPForQuantizedModel(
            (gate_proj): GeneralQuantLinear(in_features=4096, out_features=11008, bias=True)
            (up_proj): GeneralQuantLinear(in_features=4096, out_features=11008, bias=True)
            (down_proj): GeneralQuantLinear(in_features=11008, out_features=4096, bias=True)
          )
          (input_layernorm): LlamaRMSNorm()
          (post_attention_layernorm): LlamaRMSNorm()
        )
      )
      (norm): LlamaRMSNorm()
    )
    (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
  )
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant