Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement batching strategies #3630

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

sauyon
Copy link
Contributor

@sauyon sauyon commented Mar 2, 2023

This adds a new configuration value, runner.batching.target_latency_ms, which controls how long the dispatcher will wait before beginning to execute requests.

Could probably do with a little bit of testing to see how setting it to 0 performs vs leaving as ~, but for now adding more knobs users can tweak is probably a good thing; I suspect there will be at least a few people who want the behavior of infinite max latency but not long wait times for requests after a burst.

EDIT: This PR has now been updated to provide a strategy option in the configuration, which allows a user to define which strategy they would like to use.

/cc @timliubentoml

@sauyon sauyon requested a review from a team as a code owner March 2, 2023 02:53
@sauyon sauyon requested review from ssheng, bojiang, a team and larme and removed request for a team March 2, 2023 02:53
@codecov
Copy link

codecov bot commented Mar 2, 2023

Codecov Report

Merging #3630 (9db629e) into main (33c8440) will increase coverage by 31.85%.
Report is 112 commits behind head on main.
The diff coverage is 9.09%.

❗ Current head 9db629e differs from pull request most recent head 56088fe. Consider uploading reports for the commit 56088fe to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##            main    #3630       +/-   ##
==========================================
+ Coverage   0.00%   31.85%   +31.85%     
==========================================
  Files        166      146       -20     
  Lines      15286    12038     -3248     
  Branches       0     1989     +1989     
==========================================
+ Hits           0     3835     +3835     
+ Misses     15286     7928     -7358     
- Partials       0      275      +275     
Files Changed Coverage Δ
src/bentoml/_internal/configuration/v1/__init__.py 48.83% <ø> (+48.83%) ⬆️
src/bentoml/_internal/marshal/dispatcher.py 0.00% <0.00%> (ø)
src/bentoml/_internal/models/model.py 77.59% <ø> (+77.59%) ⬆️
src/bentoml/_internal/server/runner_app.py 0.00% <ø> (ø)
src/bentoml/triton.py 0.00% <ø> (ø)
src/bentoml/_internal/runner/runner.py 56.61% <66.66%> (+56.61%) ⬆️

... and 119 files with indirect coverage changes

@sauyon
Copy link
Contributor Author

sauyon commented Mar 2, 2023

Oh, I'd forgotten about ruff. Man, it checks fast 😅

larme
larme previously approved these changes Mar 10, 2023
Copy link
Member

@larme larme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general, just added a note of documentation improvement.

docs/source/guides/batching.rst Outdated Show resolved Hide resolved
aarnphm
aarnphm previously approved these changes Mar 10, 2023
Copy link
Member

@aarnphm aarnphm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@sauyon
Copy link
Contributor Author

sauyon commented Mar 10, 2023

This one is waiting on me to change some naming around, need to get to that.

@sauyon sauyon marked this pull request as draft March 14, 2023 02:32
@sauyon sauyon dismissed stale reviews from aarnphm and larme via 533abdb March 16, 2023 01:15
@sauyon sauyon changed the title feat: implement target latency for batching feat: implement batching strategies Mar 16, 2023
now = time.time()
w0 = now - queue[0].enqueue_time

if w0 < self.wait:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add loop checking for max_batch_size here.

# we are now free to dispatch whenever we like
await self.strategy.wait(self._queue, optimizer, self.max_latency, self.max_batch_size, self.tick_interval)

n = len(self._queue)
n_call_out = min(self.max_batch_size, n)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this (and above) logic into strategy.

@bojiang
Copy link
Member

bojiang commented Mar 23, 2023

Had some discussion about this PR with Sauyon. These are decisions:

  1. adding back pressure handling logic to the new strategy
  2. adjust the refactoring, move statistical regression into Intelligent Wait strategy.
  3. move max_batch_size and max_latency into strategy_options

@sauyon sauyon self-assigned this Apr 1, 2023
# we are not about to cancel the first request,
and latency_0 + dt <= self.max_latency * 0.95
# and waiting will cause average latency to decrese
and n * (wn + dt + optimizer.o_a) <= optimizer.wait * decay
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n: number of requests in queue
multiplied by
(
wn: predictor of next request time +
dt: tick time +
o_a: optimizer slope
)

^ The above is a measure of how much latency will be added to every request if we wait for a new request and add that to the batch

less than

optimizer.wait: the average amount of time a request sits in queue
*
decay: an arbitrary decay value so that average wait should hopefully decay over time

@sauyon
Copy link
Contributor Author

sauyon commented Apr 18, 2023

@bojiang this should be ok to look at for now, broad strokes.

class TargetLatencyStrategy(BatchingStrategy, strategy_id="target_latency"):
latency: float = 1.

def __init__(self, options: dict[t.Any, t.Any]):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: typed dict for init.

strategy_id: str

@abc.abstractmethod
def controller(queue: t.Sequence[Job], predict_execution_time: t.Callable[t.Sequence[Job]], dispatch: t.Callable[]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def controller(queue: t.Sequence[Job], predict_execution_time: t.Callable[t.Sequence[Job]], dispatch: t.Callable[]):
def controller(queue: t.Sequence[Job], predict_execution_time: t.Callable[[t.Sequence[Job]], t.Any], dispatch: t.Callable[..., t.Any]):

the way that the scheduler chooses a batching window, i.e. the time it waits for requests to combine
them into a batch before dispatching it to begin execution. There are three options:

- target_latency: this strategy waits until it expects the first request received will take around
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 94 to 96
strategy: adaptive
strategy_options:
decay: 0.95
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the same pattern as the optimizer?

strategy:
  name: adaptive
  options:
    decay: 0.95

Comment on lines +75 to +76
Batching Strategy
^^^^^^^^^^^^^^^^^
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs changes are outdated, correct?

fallback=functools.partial(
ServiceUnavailable, message="process is overloaded"
ServiceUnavailable, message="runner process is overloaded"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's give more info about which runner and index.

@sauyon sauyon marked this pull request as ready for review April 25, 2023 02:55
@sauyon
Copy link
Contributor Author

sauyon commented Apr 25, 2023

I think this should be ready for review now if anybody wants to take a look (@bojiang I implemented wait time).

Once I add some tests I'll probably factor this into separate commits.

Copy link
Member

@aarnphm aarnphm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 more thing is to update documentation, I have read through the marshal refactor and LGTM

@aarnphm aarnphm added the pr/merge-hold Requires further discussions before a pull request can be merged label May 9, 2023
@aarnphm
Copy link
Member

aarnphm commented May 9, 2023

status: We probably want a load test before merging this one in.

@judahrand
Copy link
Contributor

Is this likely to be reviewed and merged?

@pep8speaks
Copy link

pep8speaks commented Sep 20, 2023

Hello @sauyon! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 72:80: E501 line too long (83 > 79 characters)
Line 103:80: E501 line too long (101 > 79 characters)
Line 131:80: E501 line too long (85 > 79 characters)
Line 162:80: E501 line too long (84 > 79 characters)
Line 213:80: E501 line too long (92 > 79 characters)
Line 319:80: E501 line too long (88 > 79 characters)
Line 476:80: E501 line too long (87 > 79 characters)
Line 541:80: E501 line too long (82 > 79 characters)
Line 558:80: E501 line too long (81 > 79 characters)

Line 202:80: E501 line too long (107 > 79 characters)
Line 203:80: E501 line too long (111 > 79 characters)
Line 205:80: E501 line too long (110 > 79 characters)
Line 267:80: E501 line too long (125 > 79 characters)
Line 271:80: E501 line too long (86 > 79 characters)
Line 274:80: E501 line too long (122 > 79 characters)
Line 284:80: E501 line too long (80 > 79 characters)
Line 286:80: E501 line too long (158 > 79 characters)
Line 300:80: E501 line too long (83 > 79 characters)

Comment last updated at 2023-09-20 02:31:21 UTC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/merge-hold Requires further discussions before a pull request can be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants