[Usage]: Profiling Prefill and Decode Phases Separately #4900

Msiavashi · 2024-05-18T20:21:37Z

Your current environment

I'm attempting to independently measure the performance (e.g., latency, throughput, etc.) of the prefill and decode phases. Is there a way to achieve this? I have noticed a few benchmarks that measure end-to-end throughput and latency but do not provide separate metrics for each phase.

I would greatly appreciate any guidance on profiling these two phases separately.

How would you like to use vllm

No response

leiwen83 · 2024-05-19T13:13:23Z

stream mode shall get each token's latency, and thus prefill and decode phase could be measured.
While current benchmark using sync mode, another workaround may be considered is:

measure latency for input_len=1000, output_len=1, thus get prefill latency for input_len=1000
measure latency for input_len=1, output_len=1, get average latency A, and then input_len=1, output_len=1000, get average latency B. (B-A)/999 to get the decode latency...

Msiavashi · 2024-05-20T09:07:11Z

So, there is still no embedded mechanism for these measurements/profiling, right?

Msiavashi added the usage How to use vllm label May 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: Profiling Prefill and Decode Phases Separately #4900

[Usage]: Profiling Prefill and Decode Phases Separately #4900

Msiavashi commented May 18, 2024

leiwen83 commented May 19, 2024

Msiavashi commented May 20, 2024

[Usage]: Profiling Prefill and Decode Phases Separately #4900

[Usage]: Profiling Prefill and Decode Phases Separately #4900

Comments

Msiavashi commented May 18, 2024

Your current environment

How would you like to use vllm

leiwen83 commented May 19, 2024

Msiavashi commented May 20, 2024