Feature/streaming #314

gecBurton · 2024-05-07T06:21:12Z

Context

This is a POC to investigate:

how streaming works in langchain
if/how websockets integrate with fastapi
how chat-requests and responses might work with websockets

I have not:

tested this code
given any thought to security
considered how to limit the document retrieval to just those owned by the requestor
integrated this with the existing rag prompts/code

Changes proposed in this pull request

Guidance to review

I have included a small http page to demonstrate this working, note that the chat is sent as text and the documents as binary, although this is just the first thing i thought of an not necessarily the best way to do this.

docker compose up core-api elasticsearch

http://0.0.0.0:5002/chat/chit-chat

Relevant links

Things to check

I have added any new ENV vars in all deployed environments
I have tested any code added or changed
I have run integration tests

KevinEtchells · 2024-05-07T08:13:21Z

Thanks for your work on this @gecBurton! I'm not able to test locally (general issue running core-api, not to do with this work), but it looks promising.

Thinking in terms of how this looks to the user client-side, I think it'd make sense for the text to be streamed, and the documents to be sent all together as one chunk. I think Django should handle this though, so core-api just streams away. Not sure if using binary for the documents affects this. @brunns do you have any thoughts on this?

gecBurton · 2024-05-07T08:24:33Z

Thanks for your work on this @gecBurton! I'm not able to test locally (general issue running core-api, not to do with this work), but it looks promising.

Thinking in terms of how this looks to the user client-side, I think it'd make sense for the text to be streamed, and the documents to be sent all together as one chunk. I think Django should handle this though, so core-api just streams away. Not sure if using binary for the documents affects this. @brunns do you have any thoughts on this?

@KevinEtchells It appears that the documents are the first object to be sent back and all in one go (which i guess makes sense as the R needs to be done before the AG?). Binary streams in the same way as text. But it might make more sense to wrap them in and identifier and stream both as text, i.e.:

{"resource_type": "documents", "data": [...]}

{"resource_type": "text", "data": "As a large language model I cannot..."}

this way the client can deal with the data as it chooses?

KevinEtchells · 2024-05-07T08:31:45Z

Thanks for your work on this @gecBurton! I'm not able to test locally (general issue running core-api, not to do with this work), but it looks promising.
Thinking in terms of how this looks to the user client-side, I think it'd make sense for the text to be streamed, and the documents to be sent all together as one chunk. I think Django should handle this though, so core-api just streams away. Not sure if using binary for the documents affects this. @brunns do you have any thoughts on this?

@KevinEtchells It appears that the documents are the first object to be sent back and all in one go (which i guess makes sense as the R needs to be done before the AG?). Binary streams in the same way as text. But it might make more sense to wrap them in and identifier and stream both as text, i.e.:
{"resource_type": "documents", "data": [...]}
{"resource_type": "text", "data": "As a large language model I cannot..."}
this way the client can deal with the data as it chooses?

Great that documents all comes all in one go, and it makes sense in terms of what you have available first. We'll likely be displaying the resources last and therefore want to prioritise getting the text in as soon as possible, but whether sending documents first actually has any noticeable affect on performance would need to be tested. So probably best to stick to this default behaviour for now.
The wrappers sound like a good idea!

brunns · 2024-05-07T08:41:01Z

Just checking - we are going to be using a json structure here, even though it'll be coming over in chunks?

gecBurton · 2024-05-07T09:07:58Z

Just checking - we are going to be using a json structure here, even though it'll be coming over in chunks?

@brunns everything is open to change

currently the text is streamed as text the documents are sent in one binary chunk.

we are considering whether the documents could/should also be sent as text but then the question is "as a client how do i know if the chunk i have just received is a document or a piece of text?"

brunns · 2024-05-07T09:56:30Z

I see the issue. Trouble is, partial JSON is a real pain to deal with too. How about; firstly, documents are sent chunk by chunk, followed by a known delimiter, followed by AI messages.

KevinEtchells · 2024-05-07T09:59:05Z

I see the issue. Trouble is, partial JSON is a real pain to deal with too. How about; firstly, documents are sent chunk by chunk, followed by a known delimiter, followed by AI messages.

That's a good point. We want to avoid JSON for the main text. Documents less critical (especially if coming in one chunk), but I think @brunns suggestion is a good one.

gecBurton · 2024-05-07T10:19:48Z

I see the issue. Trouble is, partial JSON is a real pain to deal with too. How about; firstly, documents are sent chunk by chunk, followed by a known delimiter, followed by AI messages.

That's a good point. We want to avoid JSON for the main text. Documents less critical (especially if coming in one chunk), but I think @brunns suggestion is a good one.

I was thinking that the chunks themselves could be json encoded, i.e.:

chunk 1

{"resource_type": "text", "data": "As a large language model"}

chunk 2

{"resource_type": "text", "data": " I cannot comment"}

chunk 3

{"resource_type": "documents", "data": [{"url": "s3://amazon.sts/redboxdata/myDocyment.pdf"}]}

chunk 4

{"resource_type": "text", "data": " on whatever it is you just asked me."}

this way:

nothing is partial
it is always clear whether the chunk is:
- The Documents
- A piece of text

or we could stick to the binary=documents, text=chat distinction ?

brunns · 2024-05-07T10:27:51Z

I'd be very happy with that. The text might come in multiple chunks, right?

gecBurton · 2024-05-07T11:04:30Z

I'd be very happy with that. The text might come in multiple chunks, right?

yep, almost certainly

rachaelcodes · 2024-05-07T14:11:08Z

core_api/src/routes/chat.py

+            MessagesPlaceholder(variable_name="chat_history"),
+            ("user", "{input}"),
+            (
+                "user",


Is this a system prompt?

(this question might be moot when integrating with existing code from redbox/llm/prompts/chat.py )

rachaelcodes · 2024-05-07T14:16:22Z

core_api/src/routes/chat.py

+    while True:
+        chat_request = ChatRequest.parse_raw(await websocket.receive_text())
+        chat_history = [
+            HumanMessage(content=x.text) if x.role == "user" else AIMessage(content=x.text)


Can there be SystemMessages or will they have been contracted by the chains?

rachaelcodes · 2024-05-07T14:25:53Z

An interesting UX question - do we want users to be able to see the generated question with their responses? (e.g. 'Does Rishi Sunak have a cat?' in George's example) Perhaps it could be something that could be revealed behind a click?

Or would it be better to show just the questions they wrote, the answers and the sources?

rachaelcodes · 2024-05-07T14:31:21Z

core_api/src/routes/chat.py

+            // Listen for messages
+            ws.addEventListener("message", (event) => {
+              if (event.data instanceof ArrayBuffer) {
+                document.getElementById('documents').innerHTML = JSON.stringify(JSON.parse(decoder.decode(event.data)),null,2);


Do we want to be keeping a record of the sources throughout the chat rather than just the most recent? If so, would this be embedded with the relevant chat answer or all in one block at the end?

core_api/src/routes/chat.py

gecBurton · 2024-05-22T11:49:14Z

django_app/tests/test_consumers.py

-    communicator.scope["user"] = carol
-    connected, subprotocol = await communicator.connect()
-    assert connected
+    with patch("redbox_app.redbox_core.consumers.connect", new=mocked_connect):


This reverts commit 79c3d43.

gecBurton marked this pull request as draft May 7, 2024 06:21

gecBurton temporarily deployed to release May 7, 2024 06:47 — with GitHub Actions Inactive

rachaelcodes reviewed May 7, 2024

View reviewed changes

brunns reviewed May 8, 2024

View reviewed changes

core_api/src/routes/chat.py Outdated Show resolved Hide resolved

George Burton added 3 commits May 9, 2024 09:55

added streaming endpoint

af5bee3

updated html

b1ebf3d

imporvements to chat

a4143ad

rachaelcodes force-pushed the feature/streaming branch from 3d9ba42 to a4143ad Compare May 9, 2024 08:55

rachaelcodes temporarily deployed to release May 9, 2024 08:55 — with GitHub Actions Inactive

Merge branch 'main' into feature/streaming

8f05c5f

brunns temporarily deployed to release May 16, 2024 10:28 — with GitHub Actions Inactive

brunns added 3 commits May 17, 2024 07:17

Django -> core streamed chat.

32fe0bc

Merge branch 'main' into feature/streaming

bf00bc1

Merge branch 'main' into feature/streaming

786c196

brunns had a problem deploying to release May 21, 2024 15:12 — with GitHub Actions Failure

Tidy up streaming client-side code

2a1dc7e

KevinEtchells had a problem deploying to release May 21, 2024 15:39 — with GitHub Actions Failure

KevinEtchells temporarily deployed to release May 21, 2024 15:39 — with GitHub Actions Inactive

Add loading spinner to streaming responses

62ab192

KevinEtchells had a problem deploying to release May 21, 2024 15:51 — with GitHub Actions Failure

KevinEtchells temporarily deployed to release May 21, 2024 15:51 — with GitHub Actions Inactive

Move focus to streaming message

19f8758

KevinEtchells had a problem deploying to release May 21, 2024 15:53 — with GitHub Actions Failure

KevinEtchells temporarily deployed to release May 21, 2024 15:53 — with GitHub Actions Inactive

Streamed chat tests use fixtures where possible.

f0a05df

brunns temporarily deployed to release May 22, 2024 07:43 — with GitHub Actions Inactive

Merge branch 'main' into feature/streaming

fb0f478

brunns temporarily deployed to release May 22, 2024 10:58 — with GitHub Actions Inactive

brunns force-pushed the feature/streaming branch from 826aeb2 to 85d1e8f Compare May 22, 2024 10:59

Use mocks to test core API streamed chat endpoint.

bb20295

brunns force-pushed the feature/streaming branch from 85d1e8f to bb20295 Compare May 22, 2024 11:04

Turn streaming on in example env files.

79c3d43

brunns marked this pull request as ready for review May 22, 2024 11:47

brunns requested a review from rachaelcodes May 22, 2024 11:47

gecBurton commented May 22, 2024

View reviewed changes

brunns added 2 commits May 22, 2024 13:03

Merge branch 'main' into feature/streaming

359ecf4

Revert "Turn streaming on in example env files."

f09ccb7

This reverts commit 79c3d43.

brunns temporarily deployed to release May 22, 2024 13:10 — with GitHub Actions Inactive

brunns approved these changes May 22, 2024

View reviewed changes

brunns merged commit ce662c7 into main May 22, 2024
9 checks passed

brunns deleted the feature/streaming branch May 22, 2024 13:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/streaming #314

Feature/streaming #314

gecBurton commented May 7, 2024 •

edited

KevinEtchells commented May 7, 2024

gecBurton commented May 7, 2024 •

edited

KevinEtchells commented May 7, 2024 •

edited

brunns commented May 7, 2024

gecBurton commented May 7, 2024 •

edited

brunns commented May 7, 2024

KevinEtchells commented May 7, 2024

gecBurton commented May 7, 2024 •

edited

brunns commented May 7, 2024

gecBurton commented May 7, 2024

rachaelcodes May 7, 2024

rachaelcodes May 7, 2024

rachaelcodes May 7, 2024

rachaelcodes commented May 7, 2024

rachaelcodes May 7, 2024

gecBurton May 22, 2024

Feature/streaming #314

Feature/streaming #314

Conversation

gecBurton commented May 7, 2024 • edited

Context

Changes proposed in this pull request

Guidance to review

Relevant links

Things to check

KevinEtchells commented May 7, 2024

gecBurton commented May 7, 2024 • edited

KevinEtchells commented May 7, 2024 • edited

brunns commented May 7, 2024

gecBurton commented May 7, 2024 • edited

brunns commented May 7, 2024

KevinEtchells commented May 7, 2024

gecBurton commented May 7, 2024 • edited

chunk 1

chunk 2

chunk 3

chunk 4

brunns commented May 7, 2024

gecBurton commented May 7, 2024

rachaelcodes May 7, 2024

Choose a reason for hiding this comment

rachaelcodes May 7, 2024

Choose a reason for hiding this comment

rachaelcodes May 7, 2024

Choose a reason for hiding this comment

rachaelcodes commented May 7, 2024

rachaelcodes May 7, 2024

Choose a reason for hiding this comment

gecBurton May 22, 2024

Choose a reason for hiding this comment

gecBurton commented May 7, 2024 •

edited

gecBurton commented May 7, 2024 •

edited

KevinEtchells commented May 7, 2024 •

edited

gecBurton commented May 7, 2024 •

edited

gecBurton commented May 7, 2024 •

edited