[Feature]: BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences #354

sorasoras · 2024-03-23T07:06:43Z

🚀 The feature, motivation and pitch

I through this is gonna be useful for serving larger scale of user so it should be useful for this project
It should be quite useful for inference when scale up.

Alternatives

https://arxiv.org/abs/2403.09347

Additional context

No response

AlpinDale · 2024-03-24T07:24:06Z

I doubt this applies to inference as much as it does for training. Admittedly, I haven't given the paper a thorough read yet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences #354

[Feature]: BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences #354

sorasoras commented Mar 23, 2024

AlpinDale commented Mar 24, 2024

[Feature]: BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences #354

[Feature]: BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences #354

Comments

sorasoras commented Mar 23, 2024

🚀 The feature, motivation and pitch

Alternatives

Additional context

AlpinDale commented Mar 24, 2024