triton-inference-server / server Public

Notifications
Fork 1.4k
Star 7.3k

Code
Issues 399
Pull requests 48
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: triton-inference-server/server

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

399 Open 3,054 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Question] Is it possible to shutdown Triton if we detect certain cuda errors ?

#7164 opened Apr 26, 2024 by MatthieuToulemont

Question: Which backends automatically warm up models?

#7161 opened Apr 25, 2024 by nkinnaird

does triton support different model-repository assemble into a batch?

#7159 opened Apr 25, 2024 by tricky61

Failed to initialize Python stub + ModuleNotFoundError: No module named 'nvtabular', 'merlin'

#7158 opened Apr 25, 2024 by zwei2016

trt_profile_max_shapes not supported for ONNX-TRT backend

#7156 opened Apr 24, 2024 by ShuaiShao93

On server/deploy/oci -> running "helm install example ." to deploy the Inference Server and pod doesn't get to running due to Liveness probe failed & Readiness probe failed

#7154 opened Apr 24, 2024 by aviv12825

Casting NumPy string array to np_utils.Tensor disproportionately increases latency

#7153 opened Apr 24, 2024 by LLautenbacher

The time spent on the inference request process far exceeds the model inference time. How can I determine where this additional time is being consumed?

#7152 opened Apr 24, 2024 by wfd2022

HandleGenerate equivalent for sagemaker_server.cc enhancement

New feature or request

#7151 opened Apr 24, 2024 by billcai

CUDA Graph not work

#7150 opened Apr 23, 2024 by SunnyGhj

client silent failure - E0422 05:03:24.145960 1 pb_stub.cc:402] An error occurred while trying to load GPU buffers in the Python backend stub: failed to copy data: invalid argument

#7148 opened Apr 23, 2024 by jrcavani

Request for Improved Metrics and Real-Time Concurrency Reporting in Triton Inference Server

#7145 opened Apr 22, 2024 by hxer7963

Abnormal system memory usage while enabling GPU metrics

#7144 opened Apr 21, 2024 by SkyM31

Response caching GPU tensors

#7140 opened Apr 19, 2024 by rahchuenmonroe

How does share memory speed up inference? question

Further information is requested

#7126 opened Apr 17, 2024 by NikeNano

UNAVAILABLE: Internal: FileNotFoundError: [Errno2] No usable temporary directory found in ['/tp', '/var/tmp','/usr/tmp', '/tmp/python_Lsk/3'] env_6Lp

#7125 opened Apr 17, 2024 by VirginieBfd

Dynamic batching that supports static batch size with padding enhancement

New feature or request

module: server

Issues related to the server core and frontends

#7124 opened Apr 17, 2024 by ShuaiShao93

conda-pack failing: Failed to initialize Python stub for auto-complete bug

Something isn't working

module: backends

Issues related to the backends

#7121 opened Apr 15, 2024 by jadhosn

How to extract model states stored in Triton (Implicit State Management) question

Further information is requested

#7119 opened Apr 15, 2024 by chuikova-e

error running simple example module: backends

Issues related to the backends

#7118 opened Apr 15, 2024 by geraldstanje

Interaction of timeouts, ensemble scheduler and oldest sequence scheduler causes state leakage bug

Something isn't working

#7117 opened Apr 15, 2024 by jamied157

Multi instance a model in GPU does not increase the throughput in Triton.

#7108 opened Apr 12, 2024 by ign4si

Can we include commonly used data pre-processing library in triton server docker image? question

Further information is requested

#7107 opened Apr 12, 2024 by HQ01

unable to create cuda shared memory handle when using multiprocessing to send multiple requests bug

Something isn't working

module: clients

Issues related to Perf Analyzer and clients

#7101 opened Apr 11, 2024 by justanhduc

Python Backend: How can i add a new labels for all default MetricFamily? module: server

Issues related to the server core and frontends

#7098 opened Apr 11, 2024 by nhhviet98

Previous 1 2 3 4 5 … 15 16 Next

Previous Next

ProTip! Exclude everything labeled bug with -label:bug.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly