Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About gating_top_n #3

Open
Heihaierr opened this issue Dec 1, 2023 · 5 comments
Open

About gating_top_n #3

Heihaierr opened this issue Dec 1, 2023 · 5 comments

Comments

@Heihaierr
Copy link

Hi, I notice there is experiment with top_n=1 in the paper of st-moe. But in st_moe_pytorch.py,
assert top_n >= 2, 'must be 2 or more experts'
Can top_n=1 work in this implementation?

@lucidrains
Copy link
Owner

lucidrains commented Dec 1, 2023

@Heihaierr not here, as it was just to be faithful to the paper, which explored 2 and then a generalization of top-n (iirc) up to 3 and 4

i thought that top 1 didn't work that well?

@Heihaierr
Copy link
Author

@Heihaierr not here, as it was just to be faithful to the paper, which explored 2 and then a generalization of top-n (iirc) up to 3 and 4

i thought that top 1 didn't work that well?

Yes, but the paper also explored top-1 routing and shows improvement.

@lucidrains
Copy link
Owner

ah I see, yeah they did, but 2 is still recommended

Screenshot_20231203-183548_Adobe Acrobat

@Heihaierr
Copy link
Author

get it, thanks for quick reply

@moon4869
Copy link

If top_n=1, how should we achieve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants