Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Model and Dataset Size #54

Open
adrielkuek opened this issue Mar 4, 2024 · 0 comments
Open

[Question] Model and Dataset Size #54

adrielkuek opened this issue Mar 4, 2024 · 0 comments

Comments

@adrielkuek
Copy link

Question

Hi, I have 2 questions which I would like to post to the authors:

  1. I observed that the model sizing are limited to the smaller scale (1-3B params), is there a specific reason for the selection of the model size when deciding on training the experts? What are the plausible challenges you would foresee scaling upwards into the middle range (13B-30B).
  2. Ablation studies highlight that limited instruction tuned multimodal data hurts model sparsification when performing MoE training. Could you elaborate more on why this is so, and perhaps share some insights on what would be a reasonable amount of data required to achieve such sparsification.

Thanks very much for the great work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant