Streaming support #4

edsu · 2024-03-07T07:08:57Z

This commit modifies the /api/completion endpoint so that it streams results back from the underlying LLM as JSON-Lines (line delimited JSON objects).

The first object contains information about the LLM request so that the page can be updated with this information. Subsequent lines contain objects that contain a partial response from the LLM.

The client then streams the results back and displays them as they become available.

warc-gpt.mov

This commit modifies the /api/completion endpoint so that it streams results back from the underlying LLM as JSON-Lines (line delimited JSON objects). The first object contains information about the LLM request so that the page can be updated with this information. Subsequent lines contain objects that contain a partial response from the LLM. The client then streams the results back and displays them as they become available.

edsu · 2024-03-07T07:16:20Z

warc_gpt/static/index.js

@@ -39,7 +39,7 @@ const askButton = document.querySelector("#ask");
 * @returns {string}
 */
 const sanitizeString = (string, convertLineBreaks = true) => {
-  string = string.trim()
+  string = string


trim() needed to be removed here to preserve space between words as they come in

matteocargnelutti · 2024-03-07T17:09:26Z

@edsu Thank you so much for initiating this PR, really appreciate it. Streaming is definitely a great "quality of life" feature to have.

I have made a few comments but here is an overview of where I think we are:

There are a few tweaks needed to make this feature interoperable.
The history feature broke in the process and needs fixing. You can test this by asking a follow-up question and check the contents of the history object sent to the API.
(TBD) At higher level, I wonder if it could be worth decoupling text completion from context / metadata retrieval.
- At the moment, everything goes through [POST] /api/completion maybe we could have two different routes:
  - One to handle vector search and history. It would be called first.
  - One to handle text completion, and only that.
- That could also be the opportunity to decouple that logic at front-end level. We might need to handle history a little bit differently as a result.

Cheers,

edsu · 2024-03-07T17:25:15Z

Thanks yes, I thought about a cleaner API approach by decoupling endpoints. I think that would involve some sort of persistence layer separate from chromadb? That seemed like a big change so I was reluctant to go there...

I'll take a look and see if I can figure out what happened with the history. It is fun to see the response come back in dribs and drabs, especially when running with a local model, which can take some time.

matteocargnelutti · 2024-03-20T21:06:30Z

@edsu I added an implementation of streaming in #6, based on our conversation here. Thanks for kicking this off!

edsu · 2024-03-20T22:26:20Z

Awesome, I'm glad my noodling around was helpful in some way.

edsu force-pushed the streaming branch from e603f23 to 530e1a0 Compare March 7, 2024 07:15

edsu commented Mar 7, 2024

View reviewed changes

edsu closed this Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming support #4

Streaming support #4

edsu commented Mar 7, 2024

edsu Mar 7, 2024

matteocargnelutti commented Mar 7, 2024

edsu commented Mar 7, 2024 •

edited

matteocargnelutti commented Mar 20, 2024 •

edited

edsu commented Mar 20, 2024

Streaming support #4

Streaming support #4

Conversation

edsu commented Mar 7, 2024

edsu Mar 7, 2024

Choose a reason for hiding this comment

matteocargnelutti commented Mar 7, 2024

edsu commented Mar 7, 2024 • edited

matteocargnelutti commented Mar 20, 2024 • edited

edsu commented Mar 20, 2024

edsu commented Mar 7, 2024 •

edited

matteocargnelutti commented Mar 20, 2024 •

edited