Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming support #4

Closed
wants to merge 1 commit into from
Closed

Streaming support #4

wants to merge 1 commit into from

Conversation

edsu
Copy link

@edsu edsu commented Mar 7, 2024

This commit modifies the /api/completion endpoint so that it streams results back from the underlying LLM as JSON-Lines (line delimited JSON objects).

The first object contains information about the LLM request so that the page can be updated with this information. Subsequent lines contain objects that contain a partial response from the LLM.

The client then streams the results back and displays them as they become available.

warc-gpt.mov

This commit modifies the /api/completion endpoint so that it streams
results back from the underlying LLM as JSON-Lines (line delimited JSON
objects). The first object contains information about the LLM request so
that the page can be updated with this information. Subsequent lines
contain objects that contain a partial response from the LLM.

The client then streams the results back and displays them as they
become available.
@@ -39,7 +39,7 @@ const askButton = document.querySelector("#ask");
* @returns {string}
*/
const sanitizeString = (string, convertLineBreaks = true) => {
string = string.trim()
string = string
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trim() needed to be removed here to preserve space between words as they come in

@matteocargnelutti
Copy link
Collaborator

@edsu Thank you so much for initiating this PR, really appreciate it. Streaming is definitely a great "quality of life" feature to have.

I have made a few comments but here is an overview of where I think we are:

  • There are a few tweaks needed to make this feature interoperable.
  • The history feature broke in the process and needs fixing. You can test this by asking a follow-up question and check the contents of the history object sent to the API.
  • (TBD) At higher level, I wonder if it could be worth decoupling text completion from context / metadata retrieval.
    • At the moment, everything goes through [POST] /api/completion maybe we could have two different routes:
      • One to handle vector search and history. It would be called first.
      • One to handle text completion, and only that.
    • That could also be the opportunity to decouple that logic at front-end level. We might need to handle history a little bit differently as a result.

Cheers,

@edsu
Copy link
Author

edsu commented Mar 7, 2024

Thanks yes, I thought about a cleaner API approach by decoupling endpoints. I think that would involve some sort of persistence layer separate from chromadb? That seemed like a big change so I was reluctant to go there...

I'll take a look and see if I can figure out what happened with the history. It is fun to see the response come back in dribs and drabs, especially when running with a local model, which can take some time.

@matteocargnelutti
Copy link
Collaborator

matteocargnelutti commented Mar 20, 2024

@edsu I added an implementation of streaming in #6, based on our conversation here. Thanks for kicking this off!

@edsu
Copy link
Author

edsu commented Mar 20, 2024

Awesome, I'm glad my noodling around was helpful in some way.

@edsu edsu closed this Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants