Skip to content

[Question]: Very short responses when getting completions from llama-cpp-python #873

Answered by danny-avila
origintopleft asked this question in Q&A
Discussion options

You must be logged in to vote

Interesting, maybe max_tokens needs to be sent to the request, looks like the default for that project is 16 (which is incredibly low)?

abetlen/llama-cpp-python#542

Add a line after api\app\clients\OpenAIClient.js on line 64
this.modelOptions.max_tokens = 2000;

    if (!this.modelOptions) {
      this.modelOptions = {
        ...modelOptions,
        model: modelOptions.model || 'gpt-3.5-turbo',
        temperature:
          typeof modelOptions.temperature === 'undefined' ? 0.8 : modelOptions.temperature,
        top_p: typeof modelOptions.top_p === 'undefined' ? 1 : modelOptions.top_p,
        presence_penalty:
          typeof modelOptions.presence_penalty === 'undefined' ? 1 : modelOp…

Replies: 2 comments 12 replies

Comment options

You must be logged in to vote
11 replies
@ghost
Comment options

@danny-avila
Comment options

@danny-avila
Comment options

@danny-avila
Comment options

@danny-avila
Comment options

Answer selected by origintopleft
Comment options

You must be logged in to vote
1 reply
@origintopleft
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
question Further information is requested
2 participants
Converted from issue

This discussion was converted from issue #870 on September 02, 2023 16:37.