Skip to content

Implementation of the model "Hedgehog" from the paper: "The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry"

License

Notifications You must be signed in to change notification settings

kyegomez/Hedgehog

Repository files navigation

Multi-Modality

HedgeHog

Implementation of the model "Hedgehog" from the paper: "The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry". This paper implements MLPs to mimic the softmax of a transformer. Suppodesly hits SOTA on wikitext for sub quadratic models. I've too been thinking about replacing softmax with MLPs. This past month we saw doezens of papers on mamba and convolutions but MLPs might have undiscovered powers.

License

MIT

Releases

No releases published

Sponsor this project

 

Packages

No packages published