Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate settings and MoE Loss #609

Closed
wants to merge 15 commits into from
Closed

Generate settings and MoE Loss #609

wants to merge 15 commits into from

Conversation

psinger
Copy link
Collaborator

@psinger psinger commented Feb 7, 2024

This PR addresses the following:

New max_time setting for generation allowing to specifiy a max second time per generation. Closes #568

New prompt_lookup_num_tokens as discussed in https://twitter.com/joao_gante/status/1747322413006643259
Will likely only help for summarization and QA tasks - default chat inference even got slower by using it
But let's keep it as a setting one can try

Adds a new loss function MoECrossEntropy that can be used for MoE models like Mixtral. Follows the implementation of https://arxiv.org/pdf/2101.03961.pdf as implemented in https://github.com/huggingface/transformers/blob/v4.37.2/src/transformers/models/mixtral/modeling_mixtral.py#L77

First experiments with Mixtral and LoRA did not show a big impact. The scale of the loss is in general pretty much similar to the regular cross entropy, so the default additive term might be too low, but will keep recommended settings from paper and HF for now as default.

Needs more experimentation to better understand impact.
Closes #607

@psinger
Copy link
Collaborator Author

psinger commented Feb 7, 2024

Maybe hold with the review a bit, I am exploring the loss a bit more right now. Probably with LoRA it will not even properly train the gate (which can be good).

@psinger
Copy link
Collaborator Author

psinger commented May 13, 2024

closing this for now

@psinger psinger closed this May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] MoE Aux Loss [FEATURE] Add max_time generate setting
2 participants