Use awq to quantize Deepseek-coder-33B-instruct model #157

CarolXh · 2024-03-13T08:47:57Z

When I use awq official code to quantize Deepseek-coder-33B-instruct model, the scripts are as following:

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_path = '/hy-tmp/deepseek-coder-33b-instruct'
quant_path = '/hy-tmp/deepseek-coder-33b-instruct-awq'
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

model = AutoAWQForCausalLM.from_pretrained(model_path, trust_remote_code=True, device_map="auto",**{"low_cpu_mem_usage": True})
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

model.quantize(tokenizer, quant_config=quant_config)

model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)

It reports:
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Token indices sequence length is longer than the specified maximum sequence length for this model (98937 > 16384). Running this sequence through the model will result in indexing errors
Killed

And then kills my process. How can I solve this problem if I use awq on such a big model?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use awq to quantize Deepseek-coder-33B-instruct model #157

Use awq to quantize Deepseek-coder-33B-instruct model #157

CarolXh commented Mar 13, 2024 •

edited

Use awq to quantize Deepseek-coder-33B-instruct model #157

Use awq to quantize Deepseek-coder-33B-instruct model #157

Comments

CarolXh commented Mar 13, 2024 • edited

CarolXh commented Mar 13, 2024 •

edited