Generate settings and MoE Loss #609

psinger · 2024-02-07T08:39:31Z

This PR addresses the following:

New max_time setting for generation allowing to specifiy a max second time per generation. Closes #568

New prompt_lookup_num_tokens as discussed in https://twitter.com/joao_gante/status/1747322413006643259
Will likely only help for summarization and QA tasks - default chat inference even got slower by using it
But let's keep it as a setting one can try

Adds a new loss function MoECrossEntropy that can be used for MoE models like Mixtral. Follows the implementation of https://arxiv.org/pdf/2101.03961.pdf as implemented in https://github.com/huggingface/transformers/blob/v4.37.2/src/transformers/models/mixtral/modeling_mixtral.py#L77

First experiments with Mixtral and LoRA did not show a big impact. The scale of the loss is in general pretty much similar to the regular cross entropy, so the default additive term might be too low, but will keep recommended settings from paper and HF for now as default.

Needs more experimentation to better understand impact.
Closes #607

psinger · 2024-02-07T16:39:22Z

Maybe hold with the review a bit, I am exploring the loss a bit more right now. Probably with LoRA it will not even properly train the gate (which can be good).

psinger · 2024-05-13T10:38:45Z

closing this for now

psinger added 5 commits February 6, 2024 13:21

changes

4e7e952

Merge branch 'main' into psi/generate_v1

294f108

c

f22ea46

changes

f086a5c

doc

4174fb1

psinger requested review from pascal-pfeiffer and maxjeblick February 7, 2024 08:39

psinger and others added 10 commits February 8, 2024 07:52

c

c489d3e

implementation

062cf15

fix

ded6a53

noqa

96586ad

f

d6a5ac2

c

9f13787

Merge remote-tracking branch 'origin/psi/ddppad' into psi/generate_v1

1aff023

c

208c5ba

lots of stuff

511958d

Merge branch 'main' into psi/generate_v1

764b24e

psinger closed this May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate settings and MoE Loss #609

Generate settings and MoE Loss #609

psinger commented Feb 7, 2024

psinger commented Feb 7, 2024

psinger commented May 13, 2024

Generate settings and MoE Loss #609

Generate settings and MoE Loss #609

Conversation

psinger commented Feb 7, 2024

psinger commented Feb 7, 2024

psinger commented May 13, 2024