#

model-serving

Here are 124 public repositories matching this topic...

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda inference pytorch transformer llama gpt rocm model-serving mlops llm inferentia llmops llm-serving trainium

Updated May 13, 2024
Python

kserve / kserve

Standardized Serverless ML Inference Platform on Kubernetes

Updated May 13, 2024
Python

FedML-AI / FedML

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

machine-learning deep-learning inference-engine model-deployment model-serving distributed-training federated-learning mlops edge-ai ai-agent on-device-training

Updated May 13, 2024
Python

BentoML

bentoml / BentoML

The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated May 13, 2024
Python

ahkarami / Deep-Learning-in-Production

In this repository, I will share some useful notes and references about deploying deep learning-based models in production.

Updated Nov 11, 2023

mlrun / mlrun

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

python kubernetes workflow data-science machine-learning data-engineering model-serving mlops experiment-tracking mlops-workflow

Updated May 13, 2024
Python

openvinotoolkit / model_server

A scalable inference server for models optimized with OpenVINO™

kubernetes machine-learning cloud ai deep-learning inference edge dag model-serving serving openvino

Updated May 13, 2024
C++

ModelTC / lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

nlp deep-learning llama gpt model-serving llm openai-triton

Updated May 10, 2024
Python

tensorchord / envd

🏕️ Reproducible development environment

docker developer-tools development-environment hacktoberfest model-serving buildkit mlops mlops-workflow llmops

Updated May 13, 2024
Go

logicalclocks / hopsworks

Hopsworks - Data-Intensive AI platform with a Feature Store

python aws data-science machine-learning serverless azure gcp ml pyspark feature-engineering governance model-serving mlops feature-store feature-management hopsworks kserve

Updated May 7, 2024
Java

predibase / lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

transformers pytorch llama gpt lora model-serving fine-tuning llm llmops llm-serving llm-inference

Updated May 11, 2024
Python

underneathall / pinferencia

Python + Inference - Model Deployment library in Python. Simplest model inference server ever.

Updated Feb 14, 2023
Python

FederatedAI / FATE-Serving

A scalable, high-performance serving system for federated learning models

monitor inference model-serving model-versioning federated-learning

Updated Mar 19, 2024
Java

eightBEC / fastapi-ml-skeleton

FastAPI Skeleton App to serve machine learning models production-ready.

python machine-learning python3 model-serving fastapi

Updated May 4, 2024
Python

microsoft / aici

AICI: Prompts as (Wasm) Programs

rust ai wasm inference transformer language-model model-serving wasmtime llm llmops llm-serving llm-inference llm-framework

Updated May 10, 2024
Rust

lightbend / kafka-with-akka-streams-kafka-streams-tutorial

Code samples for the Lightbend tutorial on writing microservices with Akka Streams, Kafka Streams, and Kafka

akka kafka-streams model-serving

Updated May 30, 2019
Scala

bentoml / Yatai

Model Deployment at Scale on Kubernetes 🦄️

kubernetes machine-learning k8s model-deployment model-serving mlops bentoml

Updated May 8, 2024
TypeScript

truss

basetenlabs / truss

The simplest way to serve AI/ML models in production

open-source machine-learning packaging artificial-intelligence falcon easy-to-use whisper inference-server model-serving inference-api stable-diffusion wizardlm

Updated May 13, 2024
Python

bentoml / gallery

BentoML Example Projects 🎨

data-science machine-learning gallery aws-lambda serverless machine-learning-library model-management azure-machine-learning model-deployment model-serving machine-learning-workflow gcp-cloud-functions aws-sagemaker bentoml

Updated Sep 13, 2022
Python

mosec

mosecorg / mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

python rust machine-learning deep-learning mxnet tensorflow gpu cv pytorch tts hacktoberfest model-serving nerual-network machine-learning-platform jax mlops llm llm-serving

Updated May 11, 2024
Python

Improve this page

Add a description, image, and links to the model-serving topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the model-serving topic, visit your repo's landing page and select "manage topics."