New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
是否支持Qwen1.5-7B的量化版本? #32
Comments
#16 我之前验证过 Qwen1.5-72B-Chat-GPTQ-Int4,其他的量化模型应该都类似 |
谢谢!效果如果?速度如何? |
模型太大,跑的比较慢,简单测试了 LongBench的两组数据:Evaluating on: ['narrativeqa.jsonl', 'qasper.jsonl', 'result.json'] |
想问下量化的时候校准数据集是怎么选取的呢,是从LongBench中sample的数据么 @ChuanhongLi |
@ChuanhongLi 请教一下,修改哪个文件?如何修改? |
量化的模型,我们是直接用的开源的,并未采用我们自己量化的模型 |
调整下 n_local, topk, max_cached_block, chunk_size等, 可以参考下这个 #11 |
config/qwen-inf-llm.yaml |
@ChuanhongLi 非常感谢! |
@ChuanhongLi 您好,我用Qwen1.5-72B-chat-AWQ和GPTQ的版本在A100 80G上都爆显存了,想问下config需要具体修改哪些参数到什么数值呢,我用repo中原生的qwen的版本爆显存了,方便粘贴下您跑通的版本么,谢谢 |
block_size: 128 max_len: 2147483647 |
非常好的工作!请问InfLLM是否支持Qwen1.5-7B的量化版本?
The text was updated successfully, but these errors were encountered: