Enhancing Length Consistency in LLM Outputs with Token Length Penalty Loss Functions #556

Nischaydnk · 2023-12-28T11:10:57Z

Adding support for custom loss functions aimed at improving the length consistency in responses generated by fientuned LLMs. Idea is to make the output lengths of LLMs more reflective of the token lengths observed in the training data. I did several experiments using the loss functions, and noticed very low deviation in performance of models.

The loss functions implemented are:

LengthBasedTACE (Token Averaged Cross Entropy)
LengthBasedSACE (Sample Averaged Cross Entropy)

Sharing some of the experiments I did using these losses to make a comparison with original Cross Entropy Loss:

Evaluation Results:

There could be some randomness involved in eval metric, but I found consistent decrease in LLMs inference time,specially the ones which scores bad & prone to generate bad responses.

Model	Loss Function	Time Taken (min)	Eval Metric
llama13B-Chat	Token Avg CE Loss	40.45	0.810
llama13B-Chat	TokenLengthPenalty Token Avg	38.62	0.802
llama7B-Chat	Token Avg CE Loss	12.50	0.7684
llama7B-Chat	TokenLengthPenalty Token Avg	12.12	0.7484
Yi-6B-Chat	Token Avg CE Loss	18.50	0.792
Yi-6B-Chat	TokenLengthPenalty Token Avg	15.44	0.785
llama13B-Chat	Token Avg CE Loss	78.20	0.728
llama13B-Chat	TokenLengthPenalty Token Avg	76.60	0.744
Yi-6B-Chat	Token Avg CE Loss	24.44	0.712
Yi-6B-Chat	TokenLengthPenalty Token Avg	24.20	0.704

These functions uses a length penalty coefficient, in my experiments I found 0.1 coefficient to be most stable one, therefore I kept it as default. This should help close #537

psinger · 2023-12-29T18:29:55Z

Thanks @Nischaydnk -

could you please move the pr to a separate branch in your fork? The common workflow is:

Fork LLM Studio (do not rename it)
Create a branch in your fork
Make a PR for that branch here

Currently I cannot properly check it out as main branch already exists here.

Nischaydnk · 2024-01-02T11:03:35Z

Thanks @psinger

I think I will need to create a new PR using a different branch of my fork. I will close this one.

add token length based loss fnc

23ff46e

Nischaydnk closed this Jan 2, 2024

Nischaydnk mentioned this pull request Jan 2, 2024

Length Consistency in LLM Outputs with Token Length based Penalty Loss Functions #559

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancing Length Consistency in LLM Outputs with Token Length Penalty Loss Functions #556

Enhancing Length Consistency in LLM Outputs with Token Length Penalty Loss Functions #556

Nischaydnk commented Dec 28, 2023

psinger commented Dec 29, 2023

Nischaydnk commented Jan 2, 2024

Enhancing Length Consistency in LLM Outputs with Token Length Penalty Loss Functions #556

Enhancing Length Consistency in LLM Outputs with Token Length Penalty Loss Functions #556

Conversation

Nischaydnk commented Dec 28, 2023

Evaluation Results:

psinger commented Dec 29, 2023

Nischaydnk commented Jan 2, 2024