Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancing Length Consistency in LLM Outputs with Token Length Penalty Loss Functions #556

Closed
wants to merge 1 commit into from

Conversation

Nischaydnk
Copy link

Adding support for custom loss functions aimed at improving the length consistency in responses generated by fientuned LLMs. Idea is to make the output lengths of LLMs more reflective of the token lengths observed in the training data. I did several experiments using the loss functions, and noticed very low deviation in performance of models.

The loss functions implemented are:

LengthBasedTACE (Token Averaged Cross Entropy)
LengthBasedSACE (Sample Averaged Cross Entropy)

Sharing some of the experiments I did using these losses to make a comparison with original Cross Entropy Loss:

Evaluation Results:

There could be some randomness involved in eval metric, but I found consistent decrease in LLMs inference time,specially the ones which scores bad & prone to generate bad responses.

Model Loss Function Time Taken (min) Eval Metric
llama13B-Chat Token Avg CE Loss 40.45 0.810
llama13B-Chat TokenLengthPenalty Token Avg 38.62 0.802
llama7B-Chat Token Avg CE Loss 12.50 0.7684
llama7B-Chat TokenLengthPenalty Token Avg 12.12 0.7484
Yi-6B-Chat Token Avg CE Loss 18.50 0.792
Yi-6B-Chat TokenLengthPenalty Token Avg 15.44 0.785
llama13B-Chat Token Avg CE Loss 78.20 0.728
llama13B-Chat TokenLengthPenalty Token Avg 76.60 0.744
Yi-6B-Chat Token Avg CE Loss 24.44 0.712
Yi-6B-Chat TokenLengthPenalty Token Avg 24.20 0.704

These functions uses a length penalty coefficient, in my experiments I found 0.1 coefficient to be most stable one, therefore I kept it as default. This should help close #537

@psinger
Copy link
Collaborator

psinger commented Dec 29, 2023

Thanks @Nischaydnk -

could you please move the pr to a separate branch in your fork? The common workflow is:

  • Fork LLM Studio (do not rename it)
  • Create a branch in your fork
  • Make a PR for that branch here

Currently I cannot properly check it out as main branch already exists here.

@Nischaydnk
Copy link
Author

Thanks @psinger

I think I will need to create a new PR using a different branch of my fork. I will close this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Pack sequences in batch
2 participants