Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeking Help on Loss Behavior #6

Open
guanidine opened this issue Feb 1, 2024 · 0 comments
Open

Seeking Help on Loss Behavior #6

guanidine opened this issue Feb 1, 2024 · 0 comments

Comments

@guanidine
Copy link

First of all, thank you for your project, it looks great! I have been trying to apply it to ViT just like V-MoE. During the training process, I observed some changes in the losses as shown in the graph below. I have a few questions and would like to seek your guidance on whether these situations are normal:

  1. For the balance_loss, it briefly increases and then stabilizes around 5.0 without decreasing. How can I verify if the experts have achieved balance in this case?

  2. The aux_loss, which is the sum of weighted_balance_loss and weighted_router_z_loss, seems to have a relatively small contribution to the overall loss. Although it is indeed decreasing, should I increase the values of the two coef in your code?

  3. Is there a recommended batch_size for training MoE? I have noticed that different batch_size values yield different results. The batch_size mentioned in the ST-MoE paper is too large for individual users like me to refer to.
    image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant