Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLM:chatGLM2推理加速 #156

Open
Lxhnnn opened this issue Oct 26, 2023 · 1 comment
Open

LLM:chatGLM2推理加速 #156

Lxhnnn opened this issue Oct 26, 2023 · 1 comment

Comments

@Lxhnnn
Copy link

Lxhnnn commented Oct 26, 2023

怎么提高GLM2模型的推理速度

@Tongjilibo
Copy link
Owner

glm2有使用flash_attention和multihead_attention,继续加速可以考虑一些加速框架吧

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants