You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
我对qwen1half-moe-2.7B-chat使用常规lora微调后尝试了gptq的4bit量化,但在重新推理时出现了:
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1709, in getattr
[rank0]: raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
[rank0]: AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'. Did you mean: 'qweight'?
环境:Python 3.10(ubuntu22.04) Cuda 12.1
推理命令
CUDA_VISIBLE_DEVICES=0 swift infer --model_type wen1half-moe-a2_7b-chat-int4 --ckpt_dir ./checkpoint-395-merged-gptq-int4
报错信息:
INFO 05-13 21:18:53 utils.py:660] Found nccl from library /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1
INFO 05-13 21:18:53 selector.py:27] Using FlashAttention-2 backend.
[rank0]: Traceback (most recent call last):
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/swift/cli/infer.py", line 5, in <module>
[rank0]: infer_main()
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/swift/utils/run_utils.py", line 27, in x_main
[rank0]: result = llm_x(args, **kwargs)
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/swift/llm/infer.py", line 228, in llm_infer
[rank0]: llm_engine, template = prepare_vllm_engine_template(args)
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/swift/llm/utils/vllm_utils.py", line 375, in prepare_vllm_engine_template
[rank0]: llm_engine = get_vllm_engine(
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/swift/llm/utils/vllm_utils.py", line 91, in get_vllm_engine
[rank0]: llm_engine = llm_engine_cls.from_engine_args(engine_args)
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 292, in from_engine_args
[rank0]: engine = cls(
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 160, in __init__
[rank0]: self.model_executor = executor_class(
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 41, in __init__
[rank0]: self._init_executor()
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 23, in _init_executor
[rank0]: self._init_non_spec_worker()
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 69, in _init_non_spec_worker
[rank0]: self.driver_worker.load_model()
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/worker/worker.py", line 118, in load_model
[rank0]: self.model_runner.load_model()
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 164, in load_model
[rank0]: self.model = get_model(
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/model_loader/__init__.py", line 19, in get_model
[rank0]: return loader.load_model(model_config=model_config,
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 222, in load_model
[rank0]: model = _initialize_model(model_config, self.load_config,
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 88, in _initialize_model
[rank0]: return model_class(config=model_config.hf_config,
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/models/qwen2_moe.py", line 377, in __init__
[rank0]: self.model = Qwen2MoeModel(config, quant_config)
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/models/qwen2_moe.py", line 341, in __init__
[rank0]: self.layers = nn.ModuleList([
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/models/qwen2_moe.py", line 342, in <listcomp>
[rank0]: Qwen2MoeDecoderLayer(config, layer_idx, quant_config=quant_config)
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/models/qwen2_moe.py", line 283, in __init__
[rank0]: self.mlp = Qwen2MoeSparseMoeBlock(config=config,
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/models/qwen2_moe.py", line 113, in __init__
[rank0]: self.pack_params()
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/models/qwen2_moe.py", line 137, in pack_params
[rank0]: w1.append(expert.gate_up_proj.weight)
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1709, in __getattr__
[rank0]: raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
[rank0]: AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'. Did you mean: 'qweight'?
我尝试使用了pt作为推理后端,但是在模型加载后,也无法正常的进行推理:
推理命令
swift infer --model_type wen1half-moe-a2_7b-chat-int4 --ckpt_dir ./checkpoint-395-merged-gptq-int4/ --infer_backend pt
报错信息
(up_proj): QuantLinear()
)
(gate): QuantLinear()
(shared_expert_gate): QuantLinear()
)
(input_layernorm): Qwen2MoeRMSNorm()
(post_attention_layernorm): Qwen2MoeRMSNorm()
)
)
(norm): Qwen2MoeRMSNorm()
)
(lm_head): Linear(in_features=2048, out_features=151936, bias=False)
)
[INFO:swift] Qwen2MoeForCausalLM: 622.4302M Params (622.4302M Trainable [100.0000%]), 2049.3039M Buffers.
[INFO:swift] system: You are a helpful assistant.
[INFO:swift] Input `exit` or `quit` to exit the conversation.
[INFO:swift] Input `multi-line` to switch to multi-line input mode.
[INFO:swift] Input `reset-system` to reset the system and clear the history.
[INFO:swift] Input `clear` to clear the history.
<<< 你好
Exception in thread Thread-1 (generate):
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/root/miniconda3/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/generation/utils.py", line 1622, in generate
result = self._sample(
File "/root/miniconda3/lib/python3.10/site-packages/transformers/generation/utils.py", line 2791, in _sample
outputs = self(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/qwen2_moe/modeling_qwen2_moe.py", line 1350, in forward
outputs = self.model(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/qwen2_moe/modeling_qwen2_moe.py", line 1219, in forward
layer_outputs = decoder_layer(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/qwen2_moe/modeling_qwen2_moe.py", line 929, in forward
hidden_states = self.mlp(hidden_states)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/qwen2_moe/modeling_qwen2_moe.py", line 821, in forward
router_logits = self.gate(hidden_states)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/auto_gptq/nn_modules/qlinear/qlinear_cuda_old.py", line 348, in forward
weight = scales * (weight - zeros)
RuntimeError: The size of tensor a (60) must match the size of tensor b (32) at non-singleton dimension 2
我对qwen1half-moe-2.7B-chat使用常规lora微调后尝试了gptq的4bit量化,但在重新推理时出现了:
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1709, in getattr
[rank0]: raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
[rank0]: AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'. Did you mean: 'qweight'?
环境:Python 3.10(ubuntu22.04) Cuda 12.1
推理命令
报错信息:
我尝试使用了pt作为推理后端,但是在模型加载后,也无法正常的进行推理:
推理命令
报错信息
The text was updated successfully, but these errors were encountered: