Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enableBlockReuse option is not available for tensorrt_llm.runtime.ModelRunner #1594

Open
2 of 4 tasks
yupbank opened this issue May 13, 2024 · 2 comments
Open
2 of 4 tasks
Assignees
Labels
bug Something isn't working

Comments

@yupbank
Copy link

yupbank commented May 13, 2024

System Info

Nothing to do with hardware.

Who can help?

@kaiyux or @ncomly-nvidia

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

nothing to reproduce

Expected behavior

we want to enable common prefix caching, which seems this keyword enableBlockReuse is doing that for other runtime.

actual behavior

we are able to specify enableBlockReuse when using tensorrt_llm.runtime.ModelRunner

additional notes

no

@yupbank yupbank added the bug Something isn't working label May 13, 2024
@yupbank
Copy link
Author

yupbank commented May 14, 2024

e.g. for https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/multimodal/run.py#L136

there is no way i can enable enableBlockReuse

@dcampora
Copy link
Collaborator

Thanks for the report @yupbank . We're on it and it will be fixed in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants