feat: implement batching strategies #3630

sauyon · 2023-03-02T02:53:52Z

This adds a new configuration value, runner.batching.target_latency_ms, which controls how long the dispatcher will wait before beginning to execute requests.

Could probably do with a little bit of testing to see how setting it to 0 performs vs leaving as ~, but for now adding more knobs users can tweak is probably a good thing; I suspect there will be at least a few people who want the behavior of infinite max latency but not long wait times for requests after a burst.

EDIT: This PR has now been updated to provide a strategy option in the configuration, which allows a user to define which strategy they would like to use.

/cc @timliubentoml

src/bentoml/_internal/marshal/dispatcher.py

codecov · 2023-03-02T02:59:36Z

Codecov Report

Merging #3630 (9db629e) into main (33c8440) will increase coverage by 31.85%.
Report is 112 commits behind head on main.
The diff coverage is 9.09%.

❗ Current head 9db629e differs from pull request most recent head 56088fe. Consider uploading reports for the commit 56088fe to get more accurate results

@@            Coverage Diff             @@
##            main    #3630       +/-   ##
==========================================
+ Coverage   0.00%   31.85%   +31.85%     
==========================================
  Files        166      146       -20     
  Lines      15286    12038     -3248     
  Branches       0     1989     +1989     
==========================================
+ Hits           0     3835     +3835     
+ Misses     15286     7928     -7358     
- Partials       0      275      +275

Files Changed	Coverage Δ
src/bentoml/_internal/configuration/v1/__init__.py	`48.83% <ø> (+48.83%)`	⬆️
src/bentoml/_internal/marshal/dispatcher.py	`0.00% <0.00%> (ø)`
src/bentoml/_internal/models/model.py	`77.59% <ø> (+77.59%)`	⬆️
src/bentoml/_internal/server/runner_app.py	`0.00% <ø> (ø)`
src/bentoml/triton.py	`0.00% <ø> (ø)`
src/bentoml/_internal/runner/runner.py	`56.61% <66.66%> (+56.61%)`	⬆️

... and 119 files with indirect coverage changes

sauyon · 2023-03-02T20:56:09Z

Oh, I'd forgotten about ruff. Man, it checks fast 😅

larme

LGTM in general, just added a note of documentation improvement.

docs/source/guides/batching.rst

aarnphm

👍

sauyon · 2023-03-10T04:50:06Z

This one is waiting on me to change some naming around, need to get to that.

sauyon · 2023-03-23T02:52:05Z

src/bentoml/_internal/marshal/dispatcher.py

+        now = time.time()
+        w0 = now - queue[0].enqueue_time
+
+        if w0 < self.wait:


Add loop checking for max_batch_size here.

sauyon · 2023-03-23T02:53:24Z

src/bentoml/_internal/marshal/dispatcher.py

+                # we are now free to dispatch whenever we like
+                await self.strategy.wait(self._queue, optimizer, self.max_latency, self.max_batch_size, self.tick_interval)
+
+                n = len(self._queue)
                n_call_out = min(self.max_batch_size, n)


Move this (and above) logic into strategy.

bojiang · 2023-03-23T06:24:23Z

Had some discussion about this PR with Sauyon. These are decisions:

adding back pressure handling logic to the new strategy
adjust the refactoring, move statistical regression into Intelligent Wait strategy.
move max_batch_size and max_latency into strategy_options

src/bentoml/_internal/marshal/dispatcher.py

sauyon · 2023-04-05T00:37:42Z

src/bentoml/_internal/marshal/dispatcher.py

+            # we are not about to cancel the first request,
+            and latency_0 + dt <= self.max_latency * 0.95
+            # and waiting will cause average latency to decrese
+            and n * (wn + dt + optimizer.o_a) <= optimizer.wait * decay


n: number of requests in queue
multiplied by
(
wn: predictor of next request time +
dt: tick time +
o_a: optimizer slope
)

^ The above is a measure of how much latency will be added to every request if we wait for a new request and add that to the batch

less than

optimizer.wait: the average amount of time a request sits in queue
*
decay: an arbitrary decay value so that average wait should hopefully decay over time

…ml#3663)" (bentoml#3680)" This reverts commit bcc10ac.

…tency

sauyon · 2023-04-18T08:41:14Z

@bojiang this should be ok to look at for now, broad strokes.

sauyon · 2023-04-19T20:22:41Z

src/bentoml/_internal/marshal/dispatcher.py

+class TargetLatencyStrategy(BatchingStrategy, strategy_id="target_latency"):
+    latency: float = 1.
+
+    def __init__(self, options: dict[t.Any, t.Any]):


TODO: typed dict for init.

aarnphm · 2023-03-17T01:02:17Z

src/bentoml/_internal/marshal/dispatcher.py

+    strategy_id: str
+
+    @abc.abstractmethod
+    def controller(queue: t.Sequence[Job], predict_execution_time: t.Callable[t.Sequence[Job]], dispatch: t.Callable[]):


Suggested change

def controller(queue: t.Sequence[Job], predict_execution_time: t.Callable[t.Sequence[Job]], dispatch: t.Callable[]):

def controller(queue: t.Sequence[Job], predict_execution_time: t.Callable[[t.Sequence[Job]], t.Any], dispatch: t.Callable[..., t.Any]):

aarnphm · 2023-04-19T20:38:27Z

docs/source/guides/batching.rst

+the way that the scheduler chooses a batching window, i.e. the time it waits for requests to combine
+them into a batch before dispatching it to begin execution. There are three options:
+
+ - target_latency: this strategy waits until it expects the first request received will take around


ssheng · 2023-04-21T09:01:37Z

src/bentoml/_internal/configuration/v1/default_configuration.yaml

+    strategy: adaptive
+    strategy_options:
+      decay: 0.95


Maybe the same pattern as the optimizer?

strategy: name: adaptive options: decay: 0.95

ssheng · 2023-04-21T09:02:35Z

docs/source/guides/batching.rst

+Batching Strategy
+^^^^^^^^^^^^^^^^^


Docs changes are outdated, correct?

ssheng · 2023-04-21T09:04:18Z

src/bentoml/_internal/server/runner_app.py

                fallback=functools.partial(
-                    ServiceUnavailable, message="process is overloaded"
+                    ServiceUnavailable, message="runner process is overloaded"


Let's give more info about which runner and index.

sauyon · 2023-04-25T03:09:38Z

I think this should be ready for review now if anybody wants to take a look (@bojiang I implemented wait time).

Once I add some tests I'll probably factor this into separate commits.

aarnphm

1 more thing is to update documentation, I have read through the marshal refactor and LGTM

aarnphm · 2023-05-09T00:48:34Z

status: We probably want a load test before merging this one in.

judahrand · 2023-08-11T08:56:47Z

Is this likely to be reviewed and merged?

pep8speaks · 2023-09-20T01:52:09Z

Hello @sauyon! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file src/bentoml/_internal/marshal/dispatcher.py:

Line 72:80: E501 line too long (83 > 79 characters)
Line 103:80: E501 line too long (101 > 79 characters)
Line 131:80: E501 line too long (85 > 79 characters)
Line 162:80: E501 line too long (84 > 79 characters)
Line 213:80: E501 line too long (92 > 79 characters)
Line 319:80: E501 line too long (88 > 79 characters)
Line 476:80: E501 line too long (87 > 79 characters)
Line 541:80: E501 line too long (82 > 79 characters)
Line 558:80: E501 line too long (81 > 79 characters)

In the file src/bentoml/_internal/runner/runner.py:

Line 202:80: E501 line too long (107 > 79 characters)
Line 203:80: E501 line too long (111 > 79 characters)
Line 205:80: E501 line too long (110 > 79 characters)
Line 267:80: E501 line too long (125 > 79 characters)
Line 271:80: E501 line too long (86 > 79 characters)
Line 274:80: E501 line too long (122 > 79 characters)
Line 284:80: E501 line too long (80 > 79 characters)
Line 286:80: E501 line too long (158 > 79 characters)
Line 300:80: E501 line too long (83 > 79 characters)

Comment last updated at 2023-09-20 02:31:21 UTC

For more information, see https://pre-commit.ci

sauyon requested a review from a team as a code owner March 2, 2023 02:53

sauyon requested review from ssheng, bojiang, a team and larme and removed request for a team March 2, 2023 02:53

sauyon commented Mar 2, 2023

View reviewed changes

src/bentoml/_internal/marshal/dispatcher.py Outdated Show resolved Hide resolved

larme previously approved these changes Mar 10, 2023

View reviewed changes

docs/source/guides/batching.rst Outdated Show resolved Hide resolved

aarnphm previously approved these changes Mar 10, 2023

View reviewed changes

sauyon marked this pull request as draft March 14, 2023 02:32

sauyon dismissed stale reviews from aarnphm and larme via 533abdb March 16, 2023 01:15

sauyon force-pushed the target-latency branch from 9db629e to 533abdb Compare March 16, 2023 01:15

sauyon changed the title ~~feat: implement target latency for batching~~ feat: implement batching strategies Mar 16, 2023

sauyon added 4 commits March 15, 2023 18:37

set training request wait to 0

7510f21

add documentation

06ffa76

use slots class instead of a tuple

ca5edb7

implement batching strategies

c11fb81

sauyon force-pushed the target-latency branch from 533abdb to c11fb81 Compare March 16, 2023 21:21

sauyon commented Mar 23, 2023

View reviewed changes

sauyon self-assigned this Apr 1, 2023

sauyon commented Apr 5, 2023

View reviewed changes

src/bentoml/_internal/marshal/dispatcher.py Show resolved Hide resolved

Update src/bentoml/_internal/marshal/dispatcher.py

451eff0

sauyon commented Apr 5, 2023

View reviewed changes

Revert "revert: "chore(dispatcher): refactor out training code (bento…

6af6df2

…ml#3663)" (bentoml#3680)" This reverts commit bcc10ac.

sauyon added 7 commits April 17, 2023 19:31

fix refactor implementation

bc4753f

set training request wait to 0

c4e2bec

add documentation

3e69b4d

use slots class instead of a tuple

6652866

implement batching strategies

b153452

--wip-- [skip ci]

4211a6b

Merge branch 'target-latency' of github:sauyon/BentoML into target-la…

a8e17f4

…tency

Merge branch 'main' into target-latency

a4c3eac

sauyon commented Apr 19, 2023

View reviewed changes

aarnphm reviewed Apr 19, 2023

View reviewed changes

sauyon force-pushed the target-latency branch from c8467bc to 73cae40 Compare April 20, 2023 23:37

ssheng reviewed Apr 21, 2023

View reviewed changes

update optimizer

5e6c844

sauyon force-pushed the target-latency branch from 73cae40 to 5e6c844 Compare April 25, 2023 01:43

more misc fixes

ce337a6

sauyon force-pushed the target-latency branch from 2313592 to ce337a6 Compare April 25, 2023 02:54

sauyon marked this pull request as ready for review April 25, 2023 02:55

aarnphm reviewed Apr 25, 2023

View reviewed changes

aarnphm added the pr/merge-hold Requires further discussions before a pull request can be merged label May 9, 2023

sauyon added 2 commits May 24, 2023 16:30

Merge branch 'main' into target-latency

ca9bfcf

Merge branch 'main' into target-latency

17e61d5

sauyon added 2 commits September 19, 2023 18:23

format

a4d4850

Merge branch 'main' into target-latency

ce403c1

pre-commit-ci bot and others added 2 commits September 20, 2023 01:52

ci: auto fixes from pre-commit.ci

28d209a

For more information, see https://pre-commit.ci

minor fixes

56088fe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement batching strategies #3630

feat: implement batching strategies #3630

sauyon commented Mar 2, 2023 •

edited

codecov bot commented Mar 2, 2023 •

edited

sauyon commented Mar 2, 2023

larme left a comment

aarnphm left a comment

sauyon commented Mar 10, 2023

sauyon Mar 23, 2023

sauyon Mar 23, 2023

bojiang commented Mar 23, 2023

sauyon Apr 5, 2023

sauyon commented Apr 18, 2023

sauyon Apr 19, 2023

aarnphm Mar 17, 2023

aarnphm Apr 19, 2023

ssheng Apr 21, 2023

ssheng Apr 21, 2023

ssheng Apr 21, 2023

sauyon commented Apr 25, 2023

aarnphm left a comment

aarnphm commented May 9, 2023

judahrand commented Aug 11, 2023

pep8speaks commented Sep 20, 2023 •

edited

	def controller(queue: t.Sequence[Job], predict_execution_time: t.Callable[t.Sequence[Job]], dispatch: t.Callable[]):
	def controller(queue: t.Sequence[Job], predict_execution_time: t.Callable[[t.Sequence[Job]], t.Any], dispatch: t.Callable[..., t.Any]):

		Batching Strategy
		^^^^^^^^^^^^^^^^^

feat: implement batching strategies #3630

Are you sure you want to change the base?

feat: implement batching strategies #3630

Conversation

sauyon commented Mar 2, 2023 • edited

codecov bot commented Mar 2, 2023 • edited

Codecov Report

sauyon commented Mar 2, 2023

larme left a comment

Choose a reason for hiding this comment

aarnphm left a comment

Choose a reason for hiding this comment

sauyon commented Mar 10, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bojiang commented Mar 23, 2023

Choose a reason for hiding this comment

sauyon commented Apr 18, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sauyon commented Apr 25, 2023

aarnphm left a comment

Choose a reason for hiding this comment

aarnphm commented May 9, 2023

judahrand commented Aug 11, 2023

pep8speaks commented Sep 20, 2023 • edited

Comment last updated at 2023-09-20 02:31:21 UTC

sauyon commented Mar 2, 2023 •

edited

codecov bot commented Mar 2, 2023 •

edited

pep8speaks commented Sep 20, 2023 •

edited