Skip to content

Releases: triton-inference-server/server

Release 1.1.0, corresponding to NGC container 19.04

24 Apr 00:07
Compare
Choose a tag to compare

NVIDIA TensorRT Inference Server

The NVIDIA TensorRT Inference Server provides a cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or GRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server.

What's New In 1.1.0

  • Client libraries and examples now build with a separate Makefile (a Dockerfile is also included for convenience).

  • Input or output tensors with variable-size dimensions (indicated by -1 in the model configuration) can now represent tensors where the variable dimension has value 0 (zero).

  • Zero-sized input and output tensors are now supported for batching models. This enables the inference server to support models that require inputs and outputs that have shape [ batch-size ].

  • TensorFlow custom operations (C++) can now be built into the inference server. An example and documentation are included in this release.

Client Libraries and Examples

An Ubuntu 16.04 build of the client libraries and examples are included in this release in the attached v1.1.0.clients.tar.gz. See the documentation section 'Building the Client Libraries and Examples' for more information on using this file.

Release 1.0.0, corresponding to NGC container 19.03

18 Mar 20:11
Compare
Choose a tag to compare

NVIDIA TensorRT Inference Server

The NVIDIA TensorRT Inference Server provides a cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or GRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server.

What's New In 1.0.0

  • 1.0.0 is the first GA, non-beta, release of TensorRT Inference Server. See the README for information on backwards-compatibility guarantees for this and future releases.

  • Added support for stateful models and backends that require multiple inference requests be routed to the same model instance/batch slot. The new sequence batcher provides scheduling and batching capabilities for this class of models.

  • Added GRPC streaming protocol support for inference requests.

  • The HTTP front-end is now asynchronous to enable lower-latency and higher-throughput handling of inference requests.

  • Enhanced perf_client to support stateful models and backends.

Client Libraries and Examples

An Ubuntu 16.04 build of the client libraries and examples are included in this release in the attached v1.0.0.clients.tar.gz. See the documentation section 'Building the Client Libraries and Examples' for more information on using this file.

Release 0.11.0 beta, corresponding to NGC container 19.02

28 Feb 02:32
Compare
Choose a tag to compare

NVIDIA TensorRT Inference Server

The NVIDIA TensorRT Inference Server provides a cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or gRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server.

What's New In 0.11.0 Beta

  • Variable-size input and output tensor support. Models that support variable-size input tensors and produce variable-size output tensors are now supported in the model configuration by using a dimension size of -1 for those dimensions that can take on any size.

  • String datatype support. For TensorFlow models and custom backends, input and output tensors can contain strings.

  • Improved support for non-GPU systems. The inference server will run correctly on systems that do not contain GPUs and that do not have nvidia-docker or CUDA installed.

Client Libraries and Examples

An Ubuntu 16.04 build of the client libraries and examples are included in this release in the attached v0.11.0.clients.tar.gz. See the documentation section 'Building the Client Libraries and Examples' for more information on using this file.

Release 0.10.0 beta, corresponding to NGC container 19.01

28 Jan 21:03
Compare
Choose a tag to compare

NVIDIA TensorRT Inference Server

The NVIDIA TensorRT Inference Server (TRTIS) provides a cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or GRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server.

What's New In 0.10.0 Beta

  • Custom backend support. TRTIS allows individual models to be implemented with custom backends instead of by a deep-learning framework. With a custom backend a model can implement any logic desired, while still benefiting from the GPU support, concurrent execution, dynamic batching and other features provided by TRTIS.

Release 0.9.0 beta, corresponding to NGC container 18.12

20 Dec 01:10
Compare
Choose a tag to compare

NVIDIA TensorRT Inference Server 0.9.0 Beta

The NVIDIA TensorRT Inference Server (TRTIS) provides a cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or GRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server.

What's New in 0.9.0 Beta

  • TRTIS now monitors the model repository for any change and dynamically reloads the model when necessary, without requiring a server restart. It is now possible to add and remove model versions, add/remove entire models, modify the model configuration, and modify the model labels while the server is running.
  • Added a model priority parameter to the model configuration. Currently the model priority controls the CPU thread priority when executing the model and for TensorRT models also controls the CUDA stream priority.
  • Fixed a bug in GRPC API: changed the model version parameter from string to int. This is a non-backwards compatible change.
  • Added --strict-model-config=false option to allow some model configuration properties to be derived automatically. For some model types, this removes the need to specify the config.pbtxt file.
  • Improved performance from an asynchronous GRPC frontend.

Release 0.8.0 beta, corresponding to NGC container 18.11

27 Nov 21:58
Compare
Choose a tag to compare
v0.8.0

Document dependencies needed for client library and examples