Issues: triton-inference-server/tensorrtllm_backend
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
There is a problem with llama 7B model pre-processing after using triton server
bug
Something isn't working
#445
opened May 8, 2024 by
Graham1025
2 of 4 tasks
[BUG] coredump when process exit triggered after TRITONSERVER_ServerDelete
#443
opened May 7, 2024 by
hzlushiliang
InFlightBatching seems not working
bug
Something isn't working
#442
opened May 6, 2024 by
larme
2 of 4 tasks
Deploying Mixtral-8x7B-v0.1 with Triton 24.02 on A100 (160GB) raises "Cuda Runtime (out of memory)" exception
bug
Something isn't working
#438
opened Apr 29, 2024 by
kelkarn
2 of 4 tasks
GptManager’s scalability issues with input & output parameters
#437
opened Apr 28, 2024 by
service-kit
How to post sample parameters (like top_k, temperature) for triton http server
bug
Something isn't working
#436
opened Apr 26, 2024 by
wanzhenchn
2 of 4 tasks
Encountered an error in forward function: std::bad_cast
bug
Something isn't working
#435
opened Apr 26, 2024 by
wangqy1216
1 of 4 tasks
LLama 7B model can't get longer ouput text after using triton server
bug
Something isn't working
#434
opened Apr 26, 2024 by
XiaobingSuper
2 of 4 tasks
max_batch_size
seems to have no impact on model performance
bug
#429
opened Apr 23, 2024 by
VitalyPetrov
3 of 4 tasks
Performance Issue with return_context_logits Enabled in TensorRT-LLM
bug
Something isn't working
#428
opened Apr 23, 2024 by
gywlssww
2 of 4 tasks
Seg fault after loaded models in official example
bug
Something isn't working
#425
opened Apr 20, 2024 by
LeatherDeerAU
2 of 4 tasks
Can't launch triton server following docs, expecting [TensorRT] library version 9.2.0.5 got 9.3.0.1
bug
Something isn't working
#424
opened Apr 20, 2024 by
conway-abacus
2 of 4 tasks
Performance Issue with return_context_logits Enabled in TensorRT-LLM
bug
Something isn't working
triaged
Issue has been triaged by maintainers
#419
opened Apr 19, 2024 by
metterian
2 of 4 tasks
Filtering beam_search output tensors results in a string output vs list
triaged
Issue has been triaged by maintainers
#418
opened Apr 18, 2024 by
nikhilshandilya
Warmup Example of loading LoRa weights
triaged
Issue has been triaged by maintainers
#417
opened Apr 18, 2024 by
TheCodeWrangler
the result use inflight_batcher_llm_client to send multiple lora weights is not same as use tensorrtllm
triaged
Issue has been triaged by maintainers
#413
opened Apr 17, 2024 by
stifles
Feature Request: Set maximum number of in flight
feature request
New feature or request
#412
opened Apr 17, 2024 by
TheCodeWrangler
Block reuse is currently not supported with beam width > 1
triaged
Issue has been triaged by maintainers
#411
opened Apr 16, 2024 by
tonylek
Supporting beam search in streaming mode
feature request
New feature or request
#408
opened Apr 13, 2024 by
tonylek
lora_task_id, lora_weights, lora_config not found in all_models/inflight_batcher_llm/tensorrt_llm_bls/1/lib/decode.py
bug
Something isn't working
#406
opened Apr 12, 2024 by
liao217
1 of 4 tasks
Support bfloat16 LoRa Adaptors
bug
Something isn't working
triaged
Issue has been triaged by maintainers
#403
opened Apr 11, 2024 by
TheCodeWrangler
Example of LoRa weights
triaged
Issue has been triaged by maintainers
#399
opened Apr 9, 2024 by
TheCodeWrangler
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.