Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问下可以支持llama和bloom推理加速吗 #502

Open
huanghuidmml opened this issue Apr 18, 2023 · 4 comments
Open

请问下可以支持llama和bloom推理加速吗 #502

huanghuidmml opened this issue Apr 18, 2023 · 4 comments

Comments

@huanghuidmml
Copy link

No description provided.

@Taka152
Copy link
Contributor

Taka152 commented Apr 20, 2023

It is not supported currently.

@hexisyztem
Copy link
Collaborator

It will be supported in May, and it is expected that V100-32G can be deployed.

@frankxyy
Copy link

@hexisyztem Hi, can flash attention be used on V100?

@hexisyztem
Copy link
Collaborator

hexisyztem commented Jun 12, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants