[Serving][Benchmark] Add benchmark code for serving #69

xwu99 · 2024-01-19T03:41:41Z

closes #68

xwu99 · 2024-01-22T09:40:37Z

Currently prompts are sampled from ShareGPT, I am thinking adding more prompt workloads.

* support more models in finetune * modify dockerfile * fix bug caused by accelerate upgrade * add llama2 * fix error * fix error * test * fix error * support_lora_finetune * fix error * remove bin before lora test * fix * update

benchmarks/benchmark_serving.py

xwu99 · 2024-02-22T06:15:06Z

Added support for ipex prompt.json dataset, will continue to add

first/next token latencies from client side.
data format: synthesis, input/output len is sampled from gaussian distribution rather than fixed.

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

KepingYan · 2024-03-04T03:13:06Z

benchmarks/benchmark_serving.py

+    """
+    tasks: List[asyncio.Task] = []
+    progress_bar = tqdm(total=len(input_requests)) if progress else None
+    async for request in get_request(input_requests, request_rate):


Maybe we need to make a deep copy of the parameter config in this loop.

KepingYan · 2024-03-04T03:17:49Z

benchmarks/benchmark_serving.py

+    # Use sample output_len if max_new_tokens not specified
+    if "max_new_tokens" in config:
+        output_len = config["max_new_tokens"]
+    else:
+        config["max_new_tokens"] = output_len


Or make a deep copy of the config parameters here. Otherwise, when passing through line 231 for the first time, the value of config will become {'max_new_tokens': 37}, so that the output_len will be 37 every time in the future. This is why total_time is reduced a lot

good catch, pls feel free to fix this.

…t prompt

xwu99 · 2024-04-23T03:24:25Z

replaced with #163 and already merged.

xwu99 added 3 commits January 19, 2024 11:51

add benchmarks

2f4dc20

update

d1a048d

update

da03432

xwu99 marked this pull request as ready for review January 22, 2024 09:36

xwu99 requested review from carsonwang and KepingYan January 22, 2024 09:36

xwu99 and others added 6 commits January 22, 2024 11:49

update

5a4a28c

update

06b5ea0

update

765da83

update

d534f76

move README to doc/benchmark.md

eab0425

fix error & add progress bar

cf4f071

carsonwang reviewed Feb 5, 2024

View reviewed changes

benchmarks/benchmark_serving.py Outdated Show resolved Hide resolved

benchmarks/benchmark_serving.py Outdated Show resolved Hide resolved

benchmarks/benchmark_serving.py Show resolved Hide resolved

benchmarks/benchmark_serving.py Outdated Show resolved Hide resolved

xwu99 added 3 commits February 18, 2024 15:37

update

6285ae9

update doc

ff6c4cb

update

12c3bd3

xwu99 added 5 commits February 22, 2024 11:54

add support for ipex dataset

64e16cd

update

348394a

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

update

c73af8f

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

update

f3a64a9

add doc

06e4142

xwu99 marked this pull request as draft February 26, 2024 09:19

xwu99 added 2 commits February 26, 2024 17:20

TODO: fix stats

d87854f

TODO: fix stats

02a2a48

xwu99 force-pushed the add-benchmark-serving branch from 2baaeb2 to d0b5659 Compare February 27, 2024 09:13

xwu99 added 2 commits February 27, 2024 11:06

Add track-token-latency, track-input-output, results-dir

1c32952

update doc

fa328aa

xwu99 added 10 commits February 27, 2024 13:19

update doc

82d646a

update

79c45d5

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

update code

05d47ca

update

4197444

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

add sample_requests_from_random_generation

bbdb393

update

d0b5659

update

621b1f1

update

35a66ad

update

0a30e61

update

d2a7d59

KepingYan reviewed Mar 4, 2024

View reviewed changes

KepingYan added 7 commits March 5, 2024 10:37

fix max_new_tokens bug, fix output_len, add param model_type for inpu…

f2f5fb7

…t prompt

revert

1211d84

resubmit

c76d6da

fix test

1b462cb

add length limit and fix index out of range

5d1a329

merge main branch

4c49f57

fix conflict

f704f92

xwu99 mentioned this pull request Mar 18, 2024

Add client-side latency and throughput benchmarks #38

Closed

xwu99 and others added 5 commits March 25, 2024 01:26

merge upstream

c29d982

Merge branch 'intel:main' into add-benchmark-serving

71586f8

add openai support

5f0f1f6

Refactor get_request to support batching

71d8a3e

Refactor get_request to support batching

97bc73f

xwu99 closed this Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Serving][Benchmark] Add benchmark code for serving #69

[Serving][Benchmark] Add benchmark code for serving #69

xwu99 commented Jan 19, 2024

xwu99 commented Jan 22, 2024

xwu99 commented Feb 22, 2024 •

edited

KepingYan Mar 4, 2024

KepingYan Mar 4, 2024

xwu99 Mar 4, 2024

xwu99 commented Apr 23, 2024

[Serving][Benchmark] Add benchmark code for serving #69

[Serving][Benchmark] Add benchmark code for serving #69

Conversation

xwu99 commented Jan 19, 2024

xwu99 commented Jan 22, 2024

xwu99 commented Feb 22, 2024 • edited

KepingYan Mar 4, 2024

Choose a reason for hiding this comment

KepingYan Mar 4, 2024

Choose a reason for hiding this comment

xwu99 Mar 4, 2024

Choose a reason for hiding this comment

xwu99 commented Apr 23, 2024

xwu99 commented Feb 22, 2024 •

edited