replace transformerencoder with mamba #199

sunxin010205 · 2024-04-29T09:49:44Z

Hi！
After replacing an eight-layer Transformer encoder with Mamba, the training loss fails to decrease. Could it be that Mamba doesn't perform as effectively as the Transformer in the diffusion model? Looking forward to your response。
Here are my codes.

mamba.txt
mdm.txt
minimamba.txt
loss_log.txt

GuyTevet · 2024-05-07T08:21:46Z

Hi @sunxin010205 , what is Mamba?

sunxin010205 · 2024-05-14T08:24:24Z

嗨，曼巴是什么？

Mamba: Linear-Time Sequence Modeling with Selective State Spaces，Mamba is a new architecture proposed for the linear complexity of transformer。
When I use mamba instead of transformer encoder, other loss is normal, only loss_q3 cannot be reduced. Do you know what the situation is? Looking forward to your early reply!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replace transformerencoder with mamba #199

replace transformerencoder with mamba #199

sunxin010205 commented Apr 29, 2024

GuyTevet commented May 7, 2024

sunxin010205 commented May 14, 2024

replace transformerencoder with mamba #199

replace transformerencoder with mamba #199

Comments

sunxin010205 commented Apr 29, 2024

GuyTevet commented May 7, 2024

sunxin010205 commented May 14, 2024