Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

target_prefix latency #1689

Open
SimonBenhamou opened this issue Apr 30, 2024 · 2 comments
Open

target_prefix latency #1689

SimonBenhamou opened this issue Apr 30, 2024 · 2 comments

Comments

@SimonBenhamou
Copy link

Hello,

I noticed that when supplying a target_prefix to the translate_batch or generate_tokens method, the latencies for generating the supplied tokens is equivalent to the situation where they are not provided, while I would expect negligible latency because those tokens don't require any generation steps. I'm expecting the first step to be the generation of the token after the prefix tokens.

Am I missing something, or is this due to an inefficiency in ctranslate2's generation logic ?

Thanks,
Simon

@minhthuc2502
Copy link
Collaborator

If you specified the target_prefix, it would decode once in a step then generate one by one with the next steps. Without target_prefix, it would generate one by one token. In theory, it have to run faster in case of using target_prefix. Could you test with a long prefix ?

@SimonBenhamou
Copy link
Author

I did, and could reproduce the fact that

  • no matter how long the prefix, the generation time is the same
  • when using the generate_token method and measuring the latency, the generation time is the same for prefix tokens than for the subsequent tokens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants