feat: Nonstreaming API #85

JNeuvonen · 2023-07-24T07:55:23Z

The implementation uses the same start function inside process.rs for multithreading but just doesn't send server events back to the request sender on every new token but collects the tokens into a string buffer.

Currently, there is no client-side implementation, so merging should not affect client-side at all. Next, we could open an issue for client-side implementation as well.

Here is a request body for quickly testing the API (stream flag is false):

{"sampler":"top-p-top-k","prompt":"AI: Greeting! I am a friendly AI assistant. Feel free to ask me anything.\nHuman: Hello world\nAI: ","max_tokens":200,"temperature":1,"seed":147,"frequency_penalty":0.6,"presence_penalty":0,"top_k":42,"top_p":1,"stop":["AI: ","Human: "],"stream":false}

Issue

…lect tokens to str buf

vercel · 2023-07-24T07:55:26Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
local-ai-web	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Sep 24, 2023 4:14am

louisgv · 2023-07-31T20:36:34Z

@JNeuvonen hey sorry about the slow review on my end, I've been pretty busy with summer chores/errand and also other works xD... Also was investigating #62 and why the upstream llama metal doesn't seem to work on Mac anymore :d..... Will get to this by Wednesday.

Is it ok for me to cook it up a bit if I find something wrong/missing, or would you prefer just comment and you can take care of it? LMK what type of feedback is cool for you :)

JNeuvonen · 2023-08-01T05:01:07Z

Thanks, no problem at all, totally understandable. Comment & let me figure it out would be preferred feedback form, but if it’s very simple changes you can do it as well.

…

On Mon 31. Jul 2023 at 23.36, L ***@***.***> wrote: @JNeuvonen <https://github.com/JNeuvonen> hey sorry about the slow review on my end, I've been pretty busy with summer chores/errand and also other works xD... Also was investigating #62 <#62> and why the upstream llama metal doesn't seem to work on Mac anymore :d..... Will get to this by Wednesday. Is it ok for me to cook it up a bit if I find something wrong/missing, or would you prefer just comment and you can take care of it? LMK what type of feedback is cool for you :) — Reply to this email directly, view it on GitHub <#85 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ARW4OHJ6BECN5BFOGVFRKYTXTAJNZANCNFSM6AAAAAA2VF4LWU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

louisgv · 2023-08-02T06:15:42Z

apps/desktop/src-tauri/src/inference/server.rs

+    });
+
+    HttpResponse::Ok()
+      .append_header(("Content-Type", "text/plain"))


We should return application/json type here instead I think, it helps the client know to do JSON chunk parsing as needed as well based on that header type

I think that makes sense, yes. Will fix those. Thanks for looking at my code.

louisgv · 2023-08-02T06:17:14Z

apps/desktop/src-tauri/src/inference/server.rs

+      tx: Some(tx),
+    });
+
+    rx.recv().unwrap();


We should match for error and return HTTP error here IMO, otherwise would be hard to triage :d

louisgv · 2023-08-02T06:18:56Z

apps/desktop/src-tauri/src/inference/process.rs

+    } else {
+      if let Some(tx) = req.tx {
+        //Tell server thread that inference completed, and let it respond
+        let _ = tx.send(());


Do we need that _ or can we just call send here?

louisgv · 2023-08-02T06:20:42Z

apps/desktop/src-tauri/src/inference/process.rs

@@ -105,7 +109,10 @@ pub fn start(req: InferenceThreadRequest) -> JoinHandle<()> {
    let start_at = std::time::SystemTime::now();

    println!("Feeding prompt ...");
-    req.send_event("FEEDING_PROMPT");
+
+    if stream_enabled {


Can we do this check at the trait level instead. This way we can unify the interface call (in this file), and handle the stream/non-stream logic at the trait implementation level instead, would make it much nicer and more cohesive :)

louisgv · 2023-08-02T06:23:12Z

apps/desktop/src-tauri/src/inference/process.rs

  pub model_guard: ModelGuard,
  pub completion_request: CompletionRequest,
+  pub nonstream_completion_tokens: Arc<Mutex<String>>,


I think we can make this private if we use it as a trait state for the non-stream feature. Making it pub would allow others to inspect it while it's writing/locked, which could potentially deadlock the Mutex writer if we're not careful... :d

louisgv · 2023-08-02T06:26:50Z

apps/desktop/src-tauri/src/inference/server.rs

+  } else {
+    let abort_flag = Arc::new(RwLock::new(false));
+    let completion_tokens = Arc::new(Mutex::new(String::new()));
+    let (tx, rx) = flume::unbounded::<()>();


I wonder if we can make the tokensender generic, so that we can reuse that argument. The token_sender and the tx serve very similar function here, we just need to reconcile the Byte/String type. That'd make for nicer interface I think

louisgv · 2023-08-02T06:28:59Z

apps/desktop/src-tauri/src/inference/server.rs

-        }),
-      )
-    })
+  if let Some(true) = payload.stream {


This should be payload.0.stream I think, since it's a JSON.

If we can reconcile our trait above, we can infer the stream boolean via the completion_request as well, skipping a couple of lookup hoop!

louisgv

The overall idea is great thus far, added some comment and idea on improvement 👍

louisgv · 2023-08-02T06:37:19Z

apps/desktop/src-tauri/src/inference/server.rs

+          start(InferenceThreadRequest {
+            model_guard: model_guard.clone(),
+            abort_flag: abort_flag.clone(),
+            token_sender,
+            completion_request: payload.0,
+            nonstream_completion_tokens: str_buffer.clone(),
+            stream: true,
+            tx: None,
+          }),


I have this idea which I think would make this nicer - we can create the InferenceThreadRequest before the isStream check actually, since it's non-blocking state. We can then do

let request = InferenceThreadRequest { model_guard: model_guard.clone(), abort_flag: abort_flag.clone(), token_sender, completion_request: payload.0, nonstream_completion_tokens: str_buffer.clone(), } if request.isStream() {} else {}

And the .isStream is a trait public method we expose via InferenceThreadRequest, which basically return completion_request.stream

I really like your attention to detail and design thinking! I will try to implement this one, I agree, it is indeed cleaner.

louisgv · 2023-08-02T06:47:48Z

@JNeuvonen invited you as repo collaborator

louisgv · 2023-09-16T22:44:04Z

@JNeuvonen lmk if you're still able to update the PR - otherwise I can get on it sometime next week!

JNeuvonen · 2023-09-17T06:59:31Z

Hey, I apologize that I didn't come back earlier. Back when I was working on this, I was on a summer vacation, now I am back on my work schedule, and I have less time & focus. Please feel free to finish the feature.

JNeuvonen added 4 commits July 23, 2023 15:01

Make stream arg public

e19274d

Conditionally serve different request based on stream flag

b7e238a

If stream flag is not passed, prevent streaming server events and col…

ca81d59

…lect tokens to str buf

Format response, add key value pair

715b6da

vercel bot deployed to Preview July 24, 2023 07:56 View deployment

Remove redundant parentheses

48a0a83

vercel bot deployed to Preview July 25, 2023 07:09 View deployment

louisgv self-requested a review July 25, 2023 20:42

feat/nonstream_api: rename vars

16d82d9

vercel bot deployed to Preview July 26, 2023 08:17 View deployment

louisgv changed the title ~~Nonstreaming api~~ feat: Nonstreaming api Aug 2, 2023

louisgv changed the title ~~feat: Nonstreaming api~~ feat: Nonstreaming API Aug 2, 2023

update character

b7befbf

vercel bot deployed to Preview August 2, 2023 06:08 View deployment

louisgv reviewed Aug 2, 2023

View reviewed changes

louisgv requested changes Aug 2, 2023

View reviewed changes

louisgv reviewed Aug 2, 2023

View reviewed changes

Merge branch 'main' into nonstreaming-api

a89acd2

vercel bot deployed to Preview September 16, 2023 22:44 View deployment

louisgv assigned JNeuvonen Sep 17, 2023

Merge branch 'main' into nonstreaming-api

f201f6f

vercel bot deployed to Preview September 17, 2023 02:29 View deployment

Merge branch 'main' into nonstreaming-api

50da5f3

vercel bot deployed to Preview September 24, 2023 04:14 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Nonstreaming API #85

feat: Nonstreaming API #85

JNeuvonen commented Jul 24, 2023 •

edited

vercel bot commented Jul 24, 2023 •

edited

louisgv commented Jul 31, 2023

JNeuvonen commented Aug 1, 2023 via email

louisgv Aug 2, 2023 •

edited

JNeuvonen Aug 2, 2023

louisgv Aug 2, 2023 •

edited

louisgv Aug 2, 2023

louisgv Aug 2, 2023

louisgv Aug 2, 2023

louisgv Aug 2, 2023

louisgv Aug 2, 2023 •

edited

louisgv left a comment

louisgv Aug 2, 2023

JNeuvonen Aug 2, 2023

louisgv commented Aug 2, 2023

louisgv commented Sep 16, 2023

JNeuvonen commented Sep 17, 2023

feat: Nonstreaming API #85

Are you sure you want to change the base?

feat: Nonstreaming API #85

Conversation

JNeuvonen commented Jul 24, 2023 • edited

vercel bot commented Jul 24, 2023 • edited

louisgv commented Jul 31, 2023

JNeuvonen commented Aug 1, 2023 via email

louisgv Aug 2, 2023 • edited

Choose a reason for hiding this comment

JNeuvonen Aug 2, 2023

Choose a reason for hiding this comment

louisgv Aug 2, 2023 • edited

Choose a reason for hiding this comment

louisgv Aug 2, 2023

Choose a reason for hiding this comment

louisgv Aug 2, 2023

Choose a reason for hiding this comment

louisgv Aug 2, 2023

Choose a reason for hiding this comment

louisgv Aug 2, 2023

Choose a reason for hiding this comment

louisgv Aug 2, 2023 • edited

Choose a reason for hiding this comment

louisgv left a comment

Choose a reason for hiding this comment

louisgv Aug 2, 2023

Choose a reason for hiding this comment

JNeuvonen Aug 2, 2023

Choose a reason for hiding this comment

louisgv commented Aug 2, 2023

louisgv commented Sep 16, 2023

JNeuvonen commented Sep 17, 2023

JNeuvonen commented Jul 24, 2023 •

edited

vercel bot commented Jul 24, 2023 •

edited

louisgv Aug 2, 2023 •

edited

louisgv Aug 2, 2023 •

edited

louisgv Aug 2, 2023 •

edited