-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/streaming #314
Feature/streaming #314
Conversation
Thanks for your work on this @gecBurton! I'm not able to test locally (general issue running core-api, not to do with this work), but it looks promising. Thinking in terms of how this looks to the user client-side, I think it'd make sense for the text to be streamed, and the documents to be sent all together as one chunk. I think Django should handle this though, so core-api just streams away. Not sure if using binary for the documents affects this. @brunns do you have any thoughts on this? |
@KevinEtchells It appears that the documents are the first object to be sent back and all in one go (which i guess makes sense as the {"resource_type": "documents", "data": [...]} {"resource_type": "text", "data": "As a large language model I cannot..."} this way the client can deal with the data as it chooses? |
Great that documents all comes all in one go, and it makes sense in terms of what you have available first. We'll likely be displaying the resources last and therefore want to prioritise getting the text in as soon as possible, but whether sending documents first actually has any noticeable affect on performance would need to be tested. So probably best to stick to this default behaviour for now. |
Just checking - we are going to be using a json structure here, even though it'll be coming over in chunks? |
@brunns everything is open to change currently the text is streamed as text the documents are sent in one binary chunk. we are considering whether the documents could/should also be sent as text but then the question is "as a client how do i know if the chunk i have just received is a document or a piece of text?" |
I see the issue. Trouble is, partial JSON is a real pain to deal with too. How about; firstly, documents are sent chunk by chunk, followed by a known delimiter, followed by AI messages. |
That's a good point. We want to avoid JSON for the main text. Documents less critical (especially if coming in one chunk), but I think @brunns suggestion is a good one. |
I was thinking that the chunks themselves could be json encoded, i.e.: chunk 1{"resource_type": "text", "data": "As a large language model"} chunk 2{"resource_type": "text", "data": " I cannot comment"} chunk 3{"resource_type": "documents", "data": [{"url": "s3://amazon.sts/redboxdata/myDocyment.pdf"}]} chunk 4{"resource_type": "text", "data": " on whatever it is you just asked me."} this way:
or we could stick to the binary=documents, text=chat distinction ? |
I'd be very happy with that. The text might come in multiple chunks, right? |
yep, almost certainly |
MessagesPlaceholder(variable_name="chat_history"), | ||
("user", "{input}"), | ||
( | ||
"user", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a system prompt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(this question might be moot when integrating with existing code from redbox/llm/prompts/chat.py )
core_api/src/routes/chat.py
Outdated
while True: | ||
chat_request = ChatRequest.parse_raw(await websocket.receive_text()) | ||
chat_history = [ | ||
HumanMessage(content=x.text) if x.role == "user" else AIMessage(content=x.text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can there be SystemMessage
s or will they have been contracted by the chains?
An interesting UX question - do we want users to be able to see the generated question with their responses? (e.g. 'Does Rishi Sunak have a cat?' in George's example) Perhaps it could be something that could be revealed behind a click? Or would it be better to show just the questions they wrote, the answers and the sources? |
core_api/src/routes/chat.py
Outdated
// Listen for messages | ||
ws.addEventListener("message", (event) => { | ||
if (event.data instanceof ArrayBuffer) { | ||
document.getElementById('documents').innerHTML = JSON.stringify(JSON.parse(decoder.decode(event.data)),null,2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to be keeping a record of the sources throughout the chat rather than just the most recent? If so, would this be embedded with the relevant chat answer or all in one block at the end?
3d9ba42
to
a4143ad
Compare
communicator.scope["user"] = carol | ||
connected, subprotocol = await communicator.connect() | ||
assert connected | ||
with patch("redbox_app.redbox_core.consumers.connect", new=mocked_connect): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Context
This is a POC to investigate:
I have not:
Changes proposed in this pull request
Guidance to review
I have included a small http page to demonstrate this working, note that the chat is sent as text and the documents as binary, although this is just the first thing i thought of an not necessarily the best way to do this.
docker compose up core-api elasticsearch
http://0.0.0.0:5002/chat/chit-chat
Relevant links
Things to check