Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

是否支持Qwen1.5-7B的量化版本? #32

Closed
huliangbing opened this issue Apr 14, 2024 · 11 comments
Closed

是否支持Qwen1.5-7B的量化版本? #32

huliangbing opened this issue Apr 14, 2024 · 11 comments

Comments

@huliangbing
Copy link

非常好的工作!请问InfLLM是否支持Qwen1.5-7B的量化版本?

@ChuanhongLi
Copy link

#16 我之前验证过 Qwen1.5-72B-Chat-GPTQ-Int4,其他的量化模型应该都类似

@huliangbing
Copy link
Author

huliangbing commented Apr 15, 2024

谢谢!效果如果?速度如何?

@ChuanhongLi
Copy link

谢谢!效果如果?速度如何?

模型太大,跑的比较慢,简单测试了 LongBench的两组数据:Evaluating on: ['narrativeqa.jsonl', 'qasper.jsonl', 'result.json']
{'narrativeqa': 23.66, 'qasper': 40.86}

@ehuaa
Copy link

ehuaa commented Apr 22, 2024

谢谢!效果如果?速度如何?

模型太大,跑的比较慢,简单测试了 LongBench的两组数据:Evaluating on: ['narrativeqa.jsonl', 'qasper.jsonl', 'result.json'] {'narrativeqa': 23.66, 'qasper': 40.86}

想问下量化的时候校准数据集是怎么选取的呢,是从LongBench中sample的数据么 @ChuanhongLi

@huliangbing
Copy link
Author

@ChuanhongLi 请教一下,修改哪个文件?如何修改?

@ChuanhongLi
Copy link

sample的数据么 @ChuanhongLi

量化的模型,我们是直接用的开源的,并未采用我们自己量化的模型

@ChuanhongLi
Copy link

哪个文件?如何修改?

调整下 n_local, topk, max_cached_block, chunk_size等, 可以参考下这个 #11

@ChuanhongLi
Copy link

哪个文件?如何修改?

调整下 n_local, topk, max_cached_block, chunk_size等, 可以参考下这个 #11

config/qwen-inf-llm.yaml

@huliangbing
Copy link
Author

@ChuanhongLi 非常感谢!

@ehuaa
Copy link

ehuaa commented Apr 30, 2024

哪个文件?如何修改?

调整下 n_local, topk, max_cached_block, chunk_size等, 可以参考下这个 #11

config/qwen-inf-llm.yaml

@ChuanhongLi 您好,我用Qwen1.5-72B-chat-AWQ和GPTQ的版本在A100 80G上都爆显存了,想问下config需要具体修改哪些参数到什么数值呢,我用repo中原生的qwen的版本爆显存了,方便粘贴下您跑通的版本么,谢谢

@ChuanhongLi
Copy link

方便粘贴下您跑通的版本么,谢谢

block_size: 128
n_init: 128
n_local: 2048
topk: 4
repr_topk: 4
max_cached_block: 4
exc_block_size: 512
score_decay: 0.1
fattn: true
base: 1000000
distance_scale: 1.0

max_len: 2147483647
chunk_size: 1024
conv_type: qwen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants