How to set the initial kv cache length? #1577

liminn · 2024-05-11T09:38:34Z

I want to test an example: the initial kv cache length is 2048, and LLM iterate 2048 times, so the output_tokens=2048, but the initial kv cache length is 2048, and the final kv cache length is 4096(2048+2048).

if I run:

FT_NVTX=ON /opt/nvidia/nsight-systems/2024.2.1/bin/nsys profile mpirun  -n 8 --allow-run-as-root --oversubscribe ./cpp/build/benchmarks/gptSessionBenchmark --engine_dir ./benchmarks/cpp/temp/engine_out_builddocker_tp8/ --warm_up 1 --batch_size "64" --duration 0 --num_runs 1 --input_output_len "1,2048"

the initial kv cache length is 1, not 2048.
So, how to set the initial kv cache length?

The text was updated successfully, but these errors were encountered:

byshiue · 2024-05-14T07:12:19Z

You should set --input_output_len "2048,2048".

liminn · 2024-05-17T09:56:58Z

Sorry, I may not have expressed my meaning clearly.
If I set -- input_output_len "2048,2048", then I understand that it includes two part time:

part 1: one Prefill inference time (input sequence length is 2048, initial kv cache length is 0)
part 2: 2047 Decoding iteration inference times (input sequence length is actually 1, initial kv cache length is 2048), right?

However, I only want to test the inference time of part 2, so how can I set it?

byshiue · 2024-05-23T07:38:34Z

There is no way to measure that directly. You could use nsys to measure the whole workflow, and calculate the time of part 2 manually.

liminn · 2024-05-23T07:40:48Z

ok, thanks

byshiue self-assigned this May 15, 2024

byshiue added question Further information is requested triaged Issue has been triaged by maintainers labels May 15, 2024

liminn closed this as completed May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to set the initial kv cache length? #1577

How to set the initial kv cache length? #1577

liminn commented May 11, 2024

byshiue commented May 14, 2024

liminn commented May 17, 2024 •

edited

byshiue commented May 23, 2024

liminn commented May 23, 2024

How to set the initial kv cache length? #1577

How to set the initial kv cache length? #1577

Comments

liminn commented May 11, 2024

byshiue commented May 14, 2024

liminn commented May 17, 2024 • edited

byshiue commented May 23, 2024

liminn commented May 23, 2024

liminn commented May 17, 2024 •

edited