-
Notifications
You must be signed in to change notification settings - Fork 73
Issues: triton-inference-server/tensorrtllm_backend
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Question] Best practises to track inputs and predictions?
#475
opened May 24, 2024 by
FernandoDorado
tensorrt_llm_bls disregards temperature setting
bug
Something isn't working
#472
opened May 23, 2024 by
janpetrov
1 of 4 tasks
random_seed
seems to be ignored (or at least inconsistent) for inflight_batcher_llm
bug
#468
opened May 21, 2024 by
dyoshida-continua
2 of 4 tasks
unexpected error when creating modelInstanceState: [json.exception.out_of_range.403] key 'name' not found
bug
Something isn't working
#467
opened May 21, 2024 by
Godlovecui
2 of 4 tasks
[Bug] Zero temperature curl request affects non-zero temperature requests
bug
Something isn't working
#464
opened May 20, 2024 by
Hao-YunDeng
2 of 4 tasks
Can you provide an example of a visual language model or multimodal model launch by triton server?
#463
opened May 20, 2024 by
lzcchl
How to deploy one model instance across multiple GPUs to tackle the OOM problem?
#462
opened May 16, 2024 by
shil3754
decoding_mode top_k_top_p does not take effect for llama2 not same with huggingface
bug
Something isn't working
#461
opened May 16, 2024 by
yjjiang11
1 of 4 tasks
Implement XC-Cache to improve long context inference performance
#460
opened May 15, 2024 by
avianion
Tritonserver won't start up running Smaug 34b
bug
Something isn't working
#459
opened May 15, 2024 by
workuser12345
2 of 4 tasks
Mixtral 8x7-v0.1 Hangs after serving a few requests
bug
Something isn't working
#457
opened May 15, 2024 by
aaditya-srivathsan
2 of 4 tasks
[tensorrt-llm backend] A question about launch_triton_server.py
#455
opened May 13, 2024 by
victorsoda
Example Further information is requested
gpu_device_ids
for multi-model usage?
question
#448
opened May 9, 2024 by
vnkc1
2 of 4 tasks
InFlightBatching seems not working
need more info
triaged
Issue has been triaged by maintainers
#442
opened May 6, 2024 by
larme
2 of 4 tasks
Deployement failed for BERT
triaged
Issue has been triaged by maintainers
#440
opened May 3, 2024 by
vivekjoshi556
Deploying Mixtral-8x7B-v0.1 with Triton 24.02 on A100 (160GB) raises "Cuda Runtime (out of memory)" exception
bug
Something isn't working
triaged
Issue has been triaged by maintainers
#438
opened Apr 29, 2024 by
kelkarn
2 of 4 tasks
GptManager’s scalability issues with input & output parameters
feature request
New feature or request
#437
opened Apr 28, 2024 by
service-kit
Encountered an error in forward function: std::bad_cast
bug
Something isn't working
#435
opened Apr 26, 2024 by
wangqy1216
1 of 4 tasks
max_batch_size
seems to have no impact on model performance
bug
#429
opened Apr 23, 2024 by
VitalyPetrov
3 of 4 tasks
Performance Issue with return_context_logits Enabled in TensorRT-LLM
bug
Something isn't working
#428
opened Apr 23, 2024 by
gywlssww
2 of 4 tasks
Seg fault after loaded models in official example
bug
Something isn't working
#425
opened Apr 20, 2024 by
LeatherDeerAU
2 of 4 tasks
Can't launch triton server following docs, expecting [TensorRT] library version 9.2.0.5 got 9.3.0.1
bug
Something isn't working
#424
opened Apr 20, 2024 by
conway-abacus
2 of 4 tasks
Previous Next
ProTip!
Follow long discussions with comments:>50.