Missing logits in Executor API when using `return_generation_logits` #1569

AlessioNetti · 2024-05-10T06:36:46Z

System Info

Nvidia A40
CUDA 12.2
TensorRT 10.0.1.6
TensorRT-LLM 0.10.0.dev2024050700

Who can help?

@byshiue

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

When requesting generation logits via the Executor API's Python bindings, in certain cases one entire generation step is missing in the logits tensor. As a result, all of the subsequent generation steps in the logits tensor are shifted by 1 and do not match anymore with the generated tokens.

This appears to happen specifically when the generation loop terminates early due to reaching end_id: if a custom stop sequence is encountered, or if the maximum number of new tokens is reached, the returned logits are correct.

The problem can be reproduced with the steps below:

python convert_checkpoint.py --model_dir ./falcon_7b_tp1_instruct/ --dtype bfloat16 --output_dir ./falcon_7b_tp1_instruct_trt_chkpt

trtllm-build --checkpoint_dir ./falcon_7b_tp1_instruct_trt_chkpt/ --gemm_plugin bfloat16 --remove_input_padding enable --gpt_attention_plugin bfloat16 --output_dir ./falcon_7b_tp1_instruct_p200_g200 --gather_all_token_logits --max_input_len 200 --max_output_len 200 --max_batch_size 64

python example_basic.py --model_path ./falcon_7b_tp1_instruct_p200_g200

The examples/bindings/executor/example_basic.py script was modified to issue a request that exhibits the issue, and to print the arg-max of the logits at each generation step. Below is the modified script:

diff --git a/examples/bindings/executor/example_basic.py b/examples/bindings/executor/example_basic.py
index 2c7a3fc..3f1991f 100644
--- a/examples/bindings/executor/example_basic.py
+++ b/examples/bindings/executor/example_basic.py
@@ -1,4 +1,5 @@
 import argparse
+import torch
 
 import tensorrt_llm.bindings.executor as trtllm
 
@@ -21,8 +22,10 @@ if __name__ == "__main__":
 
     if executor.can_enqueue_requests():
         # Create the request.
-        request = trtllm.Request(input_token_ids=[1, 2, 3, 4],
-                                 max_new_tokens=10)
+        request = trtllm.Request(input_token_ids=[100, 20, 3, 18],
+                                 max_new_tokens=20,
+                                 end_id=25,
+                                 output_config=trtllm.OutputConfig(return_generation_logits=True))
 
         # Enqueue the request.
         request_id = executor.enqueue_request(request)
@@ -30,6 +33,9 @@ if __name__ == "__main__":
         # Wait for the new tokens.
         responses = executor.await_responses(request_id)
         output_tokens = responses[0].result.output_token_ids
+        output_top_tokens = torch.argmax(responses[0].result.generation_logits[0], dim=1).tolist()
+
 
         # Print tokens.
-        print(output_tokens)
+        print(f"Output tokens:   {output_tokens[0][4:]}")
+        print(f"Logits arg-max:  {output_top_tokens}")

Expected behavior

Since we are using top_k=1 and are not sampling tokens, we expect the argmax of the logits at each generation step to match exactly the tokens returned for the request.

actual behavior

The generated tokens and the argmax of the logits do not match, and the latter is missing one entire generation step:

Output tokens:   [94, 241, 914, 818, 271, 577, 402, 2862, 271, 1730, 544, 248, 1079, 1111, 612]
Logits arg-max:  [94, 241, 914, 818, 271, 577, 402, 2862, 1730, 544, 248, 1079, 1111, 612, 25, 0, 0, 0, 0, 0]

Notice how token 271 is missing toward the end of the logits argmax sequence, and how all subsequent tokens are shifted by 1.

additional notes

The issue was observed on all TensorRT-LLM 0.10 dev versions, up to 0.10.0.dev2024050700.

The text was updated successfully, but these errors were encountered:

trevor-m · 2024-05-17T16:37:02Z

Thanks for filing this issue @AlessioNetti, I was able to reproduce the bug. Taking a look now.

trevor-m · 2024-05-17T18:16:36Z

I think I found the issue. We should be able to get the fix in soon.

AlessioNetti added the bug Something isn't working label May 10, 2024

byshiue assigned MartinMarciniszyn and trevor-m and unassigned MartinMarciniszyn May 10, 2024

byshiue added the triaged Issue has been triaged by maintainers label May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing logits in Executor API when using `return_generation_logits` #1569

Missing logits in Executor API when using `return_generation_logits` #1569

AlessioNetti commented May 10, 2024 •

edited

trevor-m commented May 17, 2024

trevor-m commented May 17, 2024 •

edited

Missing logits in Executor API when using return_generation_logits #1569

Missing logits in Executor API when using return_generation_logits #1569

Comments

AlessioNetti commented May 10, 2024 • edited

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

trevor-m commented May 17, 2024

trevor-m commented May 17, 2024 • edited

Missing logits in Executor API when using `return_generation_logits` #1569

Missing logits in Executor API when using `return_generation_logits` #1569

AlessioNetti commented May 10, 2024 •

edited

trevor-m commented May 17, 2024 •

edited