New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming support #4
Conversation
This commit modifies the /api/completion endpoint so that it streams results back from the underlying LLM as JSON-Lines (line delimited JSON objects). The first object contains information about the LLM request so that the page can be updated with this information. Subsequent lines contain objects that contain a partial response from the LLM. The client then streams the results back and displays them as they become available.
@@ -39,7 +39,7 @@ const askButton = document.querySelector("#ask"); | |||
* @returns {string} | |||
*/ | |||
const sanitizeString = (string, convertLineBreaks = true) => { | |||
string = string.trim() | |||
string = string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trim() needed to be removed here to preserve space between words as they come in
@edsu Thank you so much for initiating this PR, really appreciate it. Streaming is definitely a great "quality of life" feature to have. I have made a few comments but here is an overview of where I think we are:
Cheers, |
Thanks yes, I thought about a cleaner API approach by decoupling endpoints. I think that would involve some sort of persistence layer separate from chromadb? That seemed like a big change so I was reluctant to go there... I'll take a look and see if I can figure out what happened with the history. It is fun to see the response come back in dribs and drabs, especially when running with a local model, which can take some time. |
Awesome, I'm glad my noodling around was helpful in some way. |
This commit modifies the
/api/completion
endpoint so that it streams results back from the underlying LLM as JSON-Lines (line delimited JSON objects).The first object contains information about the LLM request so that the page can be updated with this information. Subsequent lines contain objects that contain a partial response from the LLM.
The client then streams the results back and displays them as they become available.
warc-gpt.mov