Are `fuseQKV masked attention` and Flash Attention the same? #786

likejazz · 2024-02-04T15:38:14Z

In the GPT guide(https://github.com/NVIDIA/FasterTransformer/blob/main/docs/gpt_guide.md#workflow),
Fig 2 shows fuseQKV masked attention, which looks very similar to Flash Attention. However, there's no longer any mention of fuseQKV masked attention or 'Flash Attention' in the text, so I'm wondering if it's the same technology as Flash Attention.

Am I understanding it correctly?

The text was updated successfully, but these errors were encountered:

likejazz changed the title ~~fuseQKV masked attention same with Flash Attention?~~ Are fuseQKV masked attention and Flash Attention the same? Feb 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are `fuseQKV masked attention` and Flash Attention the same? #786

Are `fuseQKV masked attention` and Flash Attention the same? #786

likejazz commented Feb 4, 2024 •

edited

Are fuseQKV masked attention and Flash Attention the same? #786

Are fuseQKV masked attention and Flash Attention the same? #786

Comments

likejazz commented Feb 4, 2024 • edited

Are `fuseQKV masked attention` and Flash Attention the same? #786

Are `fuseQKV masked attention` and Flash Attention the same? #786

likejazz commented Feb 4, 2024 •

edited