Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace transformerencoder with mamba #199

Open
sunxin010205 opened this issue Apr 29, 2024 · 2 comments
Open

replace transformerencoder with mamba #199

sunxin010205 opened this issue Apr 29, 2024 · 2 comments

Comments

@sunxin010205
Copy link

Hi!
After replacing an eight-layer Transformer encoder with Mamba, the training loss fails to decrease. Could it be that Mamba doesn't perform as effectively as the Transformer in the diffusion model? Looking forward to your response。
Here are my codes.

mamba.txt
mdm.txt
minimamba.txt
loss_log.txt

@GuyTevet
Copy link
Owner

GuyTevet commented May 7, 2024

Hi @sunxin010205 , what is Mamba?

@sunxin010205
Copy link
Author

嗨,曼巴是什么?

Mamba: Linear-Time Sequence Modeling with Selective State Spaces,Mamba is a new architecture proposed for the linear complexity of transformer。
When I use mamba instead of transformer encoder, other loss is normal, only loss_q3 cannot be reduced. Do you know what the situation is? Looking forward to your early reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants