Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on the experts' input #12

Open
mrqorib opened this issue Apr 25, 2024 · 0 comments
Open

Question on the experts' input #12

mrqorib opened this issue Apr 25, 2024 · 0 comments

Comments

@mrqorib
Copy link

mrqorib commented Apr 25, 2024

Hi, thanks a lot for your great work!

I tried using this code on my project and I found that the input that goes to the MoE module (x in the forward function of the MoE class) and the input that goes to the first expert (expert_input in the Expert class) are not the same. I thought that since the first expert is always used on all tokens, the input should be the same. Is my assumption wrong?

Second, I noticed that the input dimensions are different. In the forward function of the MoE below, the input is transformed into (b, e, c, d) dimensions

expert_inputs = einsum('b n d, b n e c -> b e c d', x, dispatch_tensor)
# feed the expert inputs through the experts.
expert_outputs = self.experts(expert_inputs)

but in the Experts class, it seems the expected dimension is (b, e, n, d). If I understand correctly, c is expert capacity, n is the sequence length, and they are not the same. Could you please also enlighten me on this?

Thank you very much for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant