You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi!
After replacing an eight-layer Transformer encoder with Mamba, the training loss fails to decrease. Could it be that Mamba doesn't perform as effectively as the Transformer in the diffusion model? Looking forward to your response。
Here are my codes.
Mamba: Linear-Time Sequence Modeling with Selective State Spaces,Mamba is a new architecture proposed for the linear complexity of transformer。
When I use mamba instead of transformer encoder, other loss is normal, only loss_q3 cannot be reduced. Do you know what the situation is? Looking forward to your early reply!
Hi!
After replacing an eight-layer Transformer encoder with Mamba, the training loss fails to decrease. Could it be that Mamba doesn't perform as effectively as the Transformer in the diffusion model? Looking forward to your response。
Here are my codes.
mamba.txt
mdm.txt
minimamba.txt
loss_log.txt
The text was updated successfully, but these errors were encountered: