Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deepspeed Ulysses #5492

Open
conceptofmind opened this issue May 2, 2024 · 2 comments
Open

Deepspeed Ulysses #5492

conceptofmind opened this issue May 2, 2024 · 2 comments

Comments

@conceptofmind
Copy link

Ring Attention should work with Deepspeed Ulysses, correct? Are there any notable issues combining deepspeed's efficient sequence parallelism with such an attention mechanism? I do understand flash attention works.

https://github.com/zhuzilin/ring-flash-attention

@samadejacobs
Copy link
Contributor

Ulysses is, in principle, attention-type agnostic. Although we haven’t specifically tested Ulysses with Ring Attention, as long as the qkv can be split or sharded along sequence and head dimensions, it should work. Contributions are welcome!

@conceptofmind
Copy link
Author

conceptofmind commented May 10, 2024

Hi @samadejacobs,

I appreciate the insight.

I will have to test both of them in conjunction together and let you know.

Thank you,

Enrico

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants