Support inference with WOQ and LoRA adapter #1434

Yuan0320 · 2024-03-28T05:48:17Z

Hi itrex team, thanks for the great work!

I've been experimenting with the Weight Only Quantization (WOQ) from ITREX, following the provided examples in weightonlyquant.md#example-for-cpu-device. The results are promising.

Now I'm interested in extending this by incorporating a trained LoRA adapter for inference. I'd like to combine pretrained weights (WOQ) with LoRA adapter (FP32/16) for inference. I'm wondering if it's feasible to achieve this, or if it's on the roadmap for future updates? Any insights or assistance would be greatly appreciated. Thanks!

XinyuYe-Intel · 2024-03-28T07:46:59Z

Hi @Yuan0320 , thanks for using ITREX.

Regarding combine pretrained weights (WOQ) with LoRA adapter (FP32/16) for inference, do you mean to add LoRA adapter (FP32/16) on top of the WOQ model, or just merge the LoRA adapter's weight to the WOQ model, could you please clarify it?

If you meant for the latter case, you can just load the LoRA adapter and merge it to the model before WOQ, and do WOQ after LoRA adapter has been merged into the model, in this way, the model's structure won't change, only its weights are updated.

Yuan0320 · 2024-03-28T08:02:32Z

Hi @XinyuYe-Intel, thanks for the quick reply and insight, it makes sense. I initially meant the former case, as I want to keep the high precision in adapter to minimize the accuracy loss from WOQ. I think it might be challenging to achieve this (add LoRA adapter (FP32/16) on top of the WOQ model).

XinyuYe-Intel · 2024-03-29T02:11:50Z

Hi @XinyuYe-Intel, thanks for the quick reply and insight, it makes sense. I initially meant the former case, as I want to keep the high precision in adapter to minimize the accuracy loss from WOQ. I think it might be challenging to achieve this (add LoRA adapter (FP32/16) on top of the WOQ model).

No problem at all. And yes, we haven't supported the former case yet.

kevinintel assigned XinyuYe-Intel Mar 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support inference with WOQ and LoRA adapter #1434

Support inference with WOQ and LoRA adapter #1434

Yuan0320 commented Mar 28, 2024

XinyuYe-Intel commented Mar 28, 2024

Yuan0320 commented Mar 28, 2024

XinyuYe-Intel commented Mar 29, 2024

Support inference with WOQ and LoRA adapter #1434

Support inference with WOQ and LoRA adapter #1434

Comments

Yuan0320 commented Mar 28, 2024

XinyuYe-Intel commented Mar 28, 2024

Yuan0320 commented Mar 28, 2024

XinyuYe-Intel commented Mar 29, 2024