Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Undeterministic Batch Formation #208

Open
sunggg opened this issue Feb 14, 2024 · 6 comments
Open

[Bug] Undeterministic Batch Formation #208

sunggg opened this issue Feb 14, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@sunggg
Copy link
Member

sunggg commented Feb 14, 2024

The default run for serve/tests/test_engine.py first adds 4 requests and then start the engine.
I expected this would form single prefill batch with 4 requests.
However, it shows non-deterministic behavior, sometimes it forms two prefill batches of 2 requests each, sometimes it forms two prefill batches of 1/3 requests each.
Debug log does not provide any useful information.

@sunggg sunggg added the bug Something isn't working label Feb 14, 2024
@sunggg
Copy link
Member Author

sunggg commented Feb 14, 2024

@elvin-n will take a look.

@masahi
Copy link
Member

masahi commented Feb 14, 2024

I think this is inherent to async request add / batch creation in staging engine. Not sure if we can make it deterministic @yelite

@yelite
Copy link

yelite commented Feb 14, 2024

I think this is inherent to async request add / batch creation in staging engine. Not sure if we can make it deterministic @yelite

But that test adds requests one by one in a for loop, and the engine push requests to queues before they reach to the worker batch. I don't see anything obvious in that code path which would create undeterministic behavior.

@masahi
Copy link
Member

masahi commented Feb 14, 2024

By "async" I meant the request arrives to the worker via AddRequestsCommand and I think that's done asynchronously with worker.step()?

@yelite
Copy link

yelite commented Feb 14, 2024

By "async" I meant the request arrives to the worker via AddRequestsCommand and I think that's done asynchronously with worker.step()?

Yes that's async and very likely to be the cause of undeterministic batch here. I don't have a good way to fix this without impacting the performance.

In #193 I plan to remove the sync engine, but add some flags to the staging engine to run things in a more synchronous way, so we can have more deterministic behavior in unit tests if it's necessary.

@elvin-n
Copy link

elvin-n commented Feb 15, 2024

There are three use cases:

  1. Explicit creation of the SynchronousInferenceEngine and usage its method add/step directly. It will have completely determined behaviour and I have not seen that serve/tests/test_engine.py process different number of requests than 4.
  2. Usage of AsyncEngineConnector. It will create async point and even for SynchronousInferenceEngine behaviour cannot be deterministic
  3. Usage of StagingInferenceEngine in any mode adds one more point of async for submitting of requests in the second process

2 and 3 async points were added by design, helps in real situation and cannot be removed. The only predictable flow can be with Sync engine and explicit call of add/step.

Lunderberg pushed a commit to Lunderberg/mlc-llm that referenced this issue Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants