Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-streaming completion API #75

Open
louisgv opened this issue Jul 6, 2023 · 2 comments
Open

Non-streaming completion API #75

louisgv opened this issue Jul 6, 2023 · 2 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@louisgv
Copy link
Owner

louisgv commented Jul 6, 2023

Support a non-stream version of the API. Without streaming, it's... more tricky (?). Since we need to load the model and do all kind of jazz before we can send them back a response.

Lack of streaming would likely requires:

  • Timeout configuration on the client's side
  • Some way to keep-alive the connection and send a completion body

Would need to experiment and see... but low priority because I personally don't use non-stream API :p... (any taker?)

@louisgv louisgv added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers labels Jul 6, 2023
@JNeuvonen
Copy link
Collaborator

JNeuvonen commented Jul 23, 2023

This is a very cool project @louisgv. I did a server implementation of the nonstreaming API, which didn't break the streaming version of the API and seems to work as expected for nonstreaming as well. Disclaimer: this is my first time writing anything beyond Hello World in Rust..

The implementation approach:

  • Use the stream flag sent in a request body to decide if server events should be sent on every token
  • If stream flag is false, then collect tokens to string buffer and send the string buffer once completion is built
  • If stream flag is true then proceed as usual and send server events to the request sender.

image_2023_07_23T12_20_50_781Z

If you want to quickly test the implementation, here's a request body for the nonstreaming API:

{"sampler":"top-p-top-k","prompt":"AI: Greeting! I am a friendly AI assistant. Feel free to ask me anything.\nHuman: Hello world\nAI: ","max_tokens":200,"temperature":1,"seed":147,"frequency_penalty":0.6,"presence_penalty":0,"top_k":42,"top_p":1,"stop":["AI: ","Human: "],"stream":false}

If the implementation seems good enough on the server side, I could proceed and create support for it on the client side as well & make a PR.

@louisgv
Copy link
Owner Author

louisgv commented Jul 24, 2023

@JNeuvonen awesome :D - feel free to open a PR! (it makes reviewing it a bit easier for me :P) I will take a deeper look in a bit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants