You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
When requesting generation logits via the Executor API's Python bindings, in certain cases one entire generation step is missing in the logits tensor. As a result, all of the subsequent generation steps in the logits tensor are shifted by 1 and do not match anymore with the generated tokens.
This appears to happen specifically when the generation loop terminates early due to reaching end_id: if a custom stop sequence is encountered, or if the maximum number of new tokens is reached, the returned logits are correct.
The problem can be reproduced with the steps below:
The examples/bindings/executor/example_basic.py script was modified to issue a request that exhibits the issue, and to print the arg-max of the logits at each generation step. Below is the modified script:
Since we are using top_k=1 and are not sampling tokens, we expect the argmax of the logits at each generation step to match exactly the tokens returned for the request.
actual behavior
The generated tokens and the argmax of the logits do not match, and the latter is missing one entire generation step:
System Info
Who can help?
@byshiue
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
When requesting generation logits via the Executor API's Python bindings, in certain cases one entire generation step is missing in the logits tensor. As a result, all of the subsequent generation steps in the logits tensor are shifted by 1 and do not match anymore with the generated tokens.
This appears to happen specifically when the generation loop terminates early due to reaching
end_id
: if a custom stop sequence is encountered, or if the maximum number of new tokens is reached, the returned logits are correct.The problem can be reproduced with the steps below:
The
examples/bindings/executor/example_basic.py
script was modified to issue a request that exhibits the issue, and to print the arg-max of the logits at each generation step. Below is the modified script:Expected behavior
Since we are using
top_k=1
and are not sampling tokens, we expect theargmax
of the logits at each generation step to match exactly the tokens returned for the request.actual behavior
The generated tokens and the
argmax
of the logits do not match, and the latter is missing one entire generation step:Notice how token
271
is missing toward the end of the logits argmax sequence, and how all subsequent tokens are shifted by 1.additional notes
The issue was observed on all TensorRT-LLM 0.10 dev versions, up to
0.10.0.dev2024050700
.The text was updated successfully, but these errors were encountered: