Question on the experts' input #12

mrqorib · 2024-04-25T05:27:47Z

Hi, thanks a lot for your great work!

I tried using this code on my project and I found that the input that goes to the MoE module (x in the forward function of the MoE class) and the input that goes to the first expert (expert_input in the Expert class) are not the same. I thought that since the first expert is always used on all tokens, the input should be the same. Is my assumption wrong?

Second, I noticed that the input dimensions are different. In the forward function of the MoE below, the input is transformed into (b, e, c, d) dimensions

st-moe-pytorch/st_moe_pytorch/st_moe_pytorch.py

Lines 609 to 613 in 6b7f7fb

    
           expert_inputs = einsum('b n d, b n e c -> b e c d', x, dispatch_tensor) 
        
           # feed the expert inputs through the experts. 
        
           expert_outputs = self.experts(expert_inputs)

but in the Experts class, it seems the expected dimension is (b, e, n, d). If I understand correctly, c is expert capacity, n is the sequence length, and they are not the same. Could you please also enlighten me on this?

Thank you very much for your help!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on the experts' input #12

Question on the experts' input #12

mrqorib commented Apr 25, 2024

Question on the experts' input #12

Question on the experts' input #12

Comments

mrqorib commented Apr 25, 2024