You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm attempting to independently measure the performance (e.g., latency, throughput, etc.) of the prefill and decode phases. Is there a way to achieve this? I have noticed a few benchmarks that measure end-to-end throughput and latency but do not provide separate metrics for each phase.
I would greatly appreciate any guidance on profiling these two phases separately.
How would you like to use vllm
No response
The text was updated successfully, but these errors were encountered:
stream mode shall get each token's latency, and thus prefill and decode phase could be measured.
While current benchmark using sync mode, another workaround may be considered is:
measure latency for input_len=1000, output_len=1, thus get prefill latency for input_len=1000
measure latency for input_len=1, output_len=1, get average latency A, and then input_len=1, output_len=1000, get average latency B. (B-A)/999 to get the decode latency...
Your current environment
I'm attempting to independently measure the performance (e.g., latency, throughput, etc.) of the prefill and decode phases. Is there a way to achieve this? I have noticed a few benchmarks that measure end-to-end throughput and latency but do not provide separate metrics for each phase.
I would greatly appreciate any guidance on profiling these two phases separately.
How would you like to use vllm
No response
The text was updated successfully, but these errors were encountered: