[Bug] Undeterministic Batch Formation #208

sunggg · 2024-02-14T16:37:05Z

The default run for serve/tests/test_engine.py first adds 4 requests and then start the engine.
I expected this would form single prefill batch with 4 requests.
However, it shows non-deterministic behavior, sometimes it forms two prefill batches of 2 requests each, sometimes it forms two prefill batches of 1/3 requests each.
Debug log does not provide any useful information.

The text was updated successfully, but these errors were encountered:

sunggg · 2024-02-14T16:49:34Z

@elvin-n will take a look.

masahi · 2024-02-14T18:44:11Z

I think this is inherent to async request add / batch creation in staging engine. Not sure if we can make it deterministic @yelite

yelite · 2024-02-14T22:37:02Z

I think this is inherent to async request add / batch creation in staging engine. Not sure if we can make it deterministic @yelite

But that test adds requests one by one in a for loop, and the engine push requests to queues before they reach to the worker batch. I don't see anything obvious in that code path which would create undeterministic behavior.

masahi · 2024-02-14T22:41:37Z

By "async" I meant the request arrives to the worker via AddRequestsCommand and I think that's done asynchronously with worker.step()?

yelite · 2024-02-14T22:51:29Z

By "async" I meant the request arrives to the worker via AddRequestsCommand and I think that's done asynchronously with worker.step()?

Yes that's async and very likely to be the cause of undeterministic batch here. I don't have a good way to fix this without impacting the performance.

In #193 I plan to remove the sync engine, but add some flags to the staging engine to run things in a more synchronous way, so we can have more deterministic behavior in unit tests if it's necessary.

elvin-n · 2024-02-15T14:55:41Z

There are three use cases:

Explicit creation of the SynchronousInferenceEngine and usage its method add/step directly. It will have completely determined behaviour and I have not seen that serve/tests/test_engine.py process different number of requests than 4.
Usage of AsyncEngineConnector. It will create async point and even for SynchronousInferenceEngine behaviour cannot be deterministic
Usage of StagingInferenceEngine in any mode adds one more point of async for submitting of requests in the second process

2 and 3 async points were added by design, helps in real situation and cannot be removed. The only predictable flow can be with Sync engine and explicit call of add/step.

* fix for vicuna * fix

sunggg added the bug Something isn't working label Feb 14, 2024

Lunderberg pushed a commit to Lunderberg/mlc-llm that referenced this issue Feb 27, 2024

Update tir dispatch for vicuna fp32 (octoml#208)

952367d

* fix for vicuna * fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Undeterministic Batch Formation #208

[Bug] Undeterministic Batch Formation #208

sunggg commented Feb 14, 2024

sunggg commented Feb 14, 2024

masahi commented Feb 14, 2024

yelite commented Feb 14, 2024

masahi commented Feb 14, 2024

yelite commented Feb 14, 2024 •

edited

elvin-n commented Feb 15, 2024

[Bug] Undeterministic Batch Formation #208

[Bug] Undeterministic Batch Formation #208

Comments

sunggg commented Feb 14, 2024

sunggg commented Feb 14, 2024

masahi commented Feb 14, 2024

yelite commented Feb 14, 2024

masahi commented Feb 14, 2024

yelite commented Feb 14, 2024 • edited

elvin-n commented Feb 15, 2024

yelite commented Feb 14, 2024 •

edited