triton-inference-server

This repository is a code sample to serve Large Language Models (LLM) on a Google Kubernetes Engine (GKE) cluster with GPUs running NVIDIA Triton Inference Server with FasterTransformer backend.

inference gke googlecloudplatform large-scale-machine-learning triton-inference-server llm fastertransformer

Updated Apr 5, 2023
Python

Curt-Park / serving-codegen-gptj-triton

Star

Serving Example of CodeGen-350M-Mono-GPTJ on Triton Inference Server with Docker and Kubernetes

docker kubernetes pytorch codegen triton-inference-server huggingface-transformers fastertransformer

Updated May 30, 2023
Python

KernelA / habr-pytriton-example

Star

python machine-learning rest-api grpc ml triton-inference-server gradio-interface pytriton

Updated Sep 26, 2023
Python

vadimkantorov / tritoninfererenceserverstringprocprimer

Star

Example string processing pipeline on Triton Inference Server

python machine-learning-pipelines triton-inference-server

Updated Dec 27, 2023
Python

YeonwooSung / MLOps

Sponsor

Star

Miscellaneous codes and writings for MLOps

Updated May 18, 2024
Jupyter Notebook

eitansela / sagemaker-mme-gpu-triton-java-client

Star

Run Multiple Models on the Same GPU with Amazon SageMaker Multi-Model Endpoints Powered by NVIDIA Triton Inference Server. A Java client is also provided.

python java multi sagemaker-deployment sagemaker-example triton-inference-server

Updated Nov 14, 2022
Java

lastsign / Task_STT_Bot

Star

Microservices with HTTP, Triton Inference Server, FastApi and Docker-compose

docker deep-learning docker-compose stt speach-to-text fastapi triton-inference-server

Updated Oct 23, 2022
Python

Alek-dr / FastAPI-TrironServer-example

Star

machine-learning fastapi triton-inference-server

Updated Nov 10, 2022
Python

An image retrieval system that utilizes deep learning ResNet for feature extraction, Local Optimized Product Quantization techniques for storage and retrieval, and efficient deployment using Nvidia technologies like TensorRT and Triton Server, all accessible through a FastAPI-powered web API.

docker deep-learning docker-compose tensorflow cnn tensorrt fastapi triton-inference-server

Updated Mar 17, 2024
Jupyter Notebook

howsmyanimeprofilepicture / trt-diffusion-tutorial-kr

Star

TensorRT를 통한 Stable Diffusion 가속하기

tensorrt triton-inference-server stable-diffusion

Updated Mar 16, 2024
Jupyter Notebook

detail-novelist / novelist-triton-server

Star

Deploy KoGPT with Triton Inference Server

transformers triton huggingface triton-inference-server kogpt gptj large-language-models fastertransformer

Updated Nov 18, 2022
Shell

Improve this page

Add a description, image, and links to the triton-inference-server topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the triton-inference-server topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

triton-inference-server

Here are 73 public repositories matching this topic...

suryanshgupta9933 / Scene-Script

vectornguyen76 / search-engine-system

AntonioConsiglio / triton_server

Team-BoonMoSa / Amazon-EC2-Inf1

Eilliw / trash-classification-public

swapkh91 / detectron2-to-tensorrt

dev6699 / yolotriton

SteliosGian / triton-server-transformers

hoang-quoc-trung / sumen-triton

RajeshThallam / fastertransformer-converter

Curt-Park / serving-codegen-gptj-triton

KernelA / habr-pytriton-example

vadimkantorov / tritoninfererenceserverstringprocprimer

YeonwooSung / MLOps

eitansela / sagemaker-mme-gpu-triton-java-client

lastsign / Task_STT_Bot

Alek-dr / FastAPI-TrironServer-example

TunggTungg / image_retrieval

howsmyanimeprofilepicture / trt-diffusion-tutorial-kr

detail-novelist / novelist-triton-server

Improve this page

Add this topic to your repo